Date Published: January 17, 2019
Publisher: Public Library of Science
Author(s): Bobak D. Kechavarzi, Huanmei Wu, Thompson N. Doman, Aamir Ahmad.
The massive genomic data from The Cancer Genome Atlas (TCGA), including proteomics data from Clinical Proteomic Tumor Analysis Consortium (CPTAC), provides a unique opportunity to study cancer systematically. While most observations are made from a single type of genomics data, we apply big data analytics and systems biology approaches by simultaneously analyzing DNA amplification, mRNA and protein abundance. Using multiple genomic profiles, we have discovered widespread dosage compensation for the extensive aneuploidy observed in TCGA breast cancer samples. We do identify 11 genes that show strong correlation across all features (DNA/mRNA/protein) analogous to that of the well-known oncogene HER2 (ERBB2). These genes are generally less well-characterized regarding their role in cancer and we advocate their further study. We also discover that shRNA knockdown of these genes has an impact on cancer cell growth, suggesting a vulnerability that could be used for cancer therapy. Our study shows the advantages of systematic big data methodologies and also provides future research directions.
The scientific literature is replete with papers highlighting the complex interplay between chromosomal instability, aneuploidy, and cancer (e.g.    ). Aneuploidy, the state of having other than the canonical or “euploid” number of chromosomes—for humans, 46—is with only rare exceptions (Downs syndrome, Trisomy 18) lethal in human embryonic development . By contrast, aneuploidy is observed with very high frequency in cancer, leading the eminent German biologist Theodor Boveri to speculate as early as 1902  that aneuploidy might have a causative role in the disease.
The data used in this study has been downloaded from multiple resources, including TCGA , Clinical Proteomic Tumor Analysis Consortium (CPTAC) , the Catalogue of Somatic Mutations in Cancer (COSMIC) , and Achilles short hairpin RNA or small hairpin RNA (shRNA) . The data and processing approaches are briefly described below. Fig 1 illustrates the overall workflow.
In this paper we have shown that in the TCGA breast cancer cohort there is widespread dosage compensation for the extensive aneuploidy that is observed. The dosage of DNA does not generally correlate well with mRNA, nor does the latter correlate well with protein levels. A total of 11 genes show strong correlation across all features (DNA/mRNA/protein); analogous to that of a well-known oncogene HER2 (ERBB2). We refer to these genes as “Broadly Dosage-Sensitive Genes” or BDSGs. We note they are much less characterized in the literature as to their role, if any, in cancer. We advocate further study of BDSGs to better understand their potential effects on cancer. This may lead to new therapies for cancer or biomarkers for improved cancer detection.