Proteome-Transcriptome Alignment of Molecular Portraits by Self-Contained Gene Set Analysis: Breast Cancer Subtypes Case Study



Ayaluri, Koushik

Journal Title

Journal ISSN

Volume Title



Gene sets are formed by grouping together functionally related genes or pathways. Gene set analysis (GSA) is a method previously developed for examining transcriptome data. As the gene sets are unit of expression in transcriptome-level GSA, similarly, the unit of protein abundance may be used for proteomics GSA. Self-contained and Competitive are two GSA approaches which differ by their underlining null hypothesis. In Self-contained approach, each gene set is evaluated to check if it is expressed differentially between two phenotypes. In Competitive approach, each gene set is compared to all the genes except the genes in that set. Competitive approaches are rapidly becoming popular for analyzing proteomics data, as much as they were for transcriptomics data. This research applied Self-contained GSA test of Gene sets net correlations analysis (GSNCA) to proteomics data of 77 annotated samples of breast cancers. Regardless of significant variation in the structure of proteomics and transcriptomics data, many pathway-wide characteristics features of breast cancer molecular subtypes were replicated at the protein level. In this work, GSA yielded a set of observations visible at proteome level, such as mitotic cell cycle process involvement in the HER2 molecular subtype. Overall, this study proves the value of Gene Sets Net Correlation Analysis (GSNCA) approach as a critical tool for analyzing proteomics data in general, and for dissecting protein-level molecular portraits of breast cancer tumors, in particular.



Gene set analysis, Proteome transcriptome, Self-contained analysis, Transcriptome alignment, Breast cancer