Research Article: Detecting Cancer Outlier Genes with Potential Rearrangement Using Gene Expression Data and Biological Networks

Date Published: June 28, 2012

Publisher: Hindawi Publishing Corporation

Author(s): Mohammed Alshalalfa, Tarek A. Bismar, Reda Alhajj.


Gene alterations are a major component of the landscape of tumor genomes. To assess the significance of these alterations in the development of prostate cancer, it is necessary to identify these alterations and analyze them from systems biology perspective. Here, we present a new method (EigFusion) for predicting outlier genes with potential gene rearrangement. EigFusion demonstrated excellent performance in identifying outlier genes with potential rearrangement by testing it to synthetic and real data to evaluate performance. EigFusion was able to identify previously unrecognized genes such as FABP5 and KCNH8 and confirmed their association with primary and metastatic prostate samples while confirmed the metastatic specificity for other genes such as PAH, TOP2A, and SPINK1. We performed protein network based approaches to analyze the network context of potential rearranged genes. Functional gene rearrangement Modules are constructed by integrating functional protein networks. Rearranged genes showed to be highly connected to well-known altered genes in cancer such as AR, RB1, MYC, and BRCA1. Finally, using clinical outcome data of prostate cancer patients, potential rearranged genes demonstrated significant association with prostate cancer specific death.

Partial Text

Genetic alterations in cancer are the most challenging factors that might lead to aggressive behavior of cells. Among the most prevalent forms of genetic alterations observed in cancer cells are gene fusions, gene amplification, and gene deletions. Recurrent translocations generally fall into two categories: functional rearrangements that result in a change in gene’s activity due either to a change in protein quality or quantity and the other category is silent translocations that have no effect on gene’s activity. Functional translocations can be categorized into two subtypes; one that leads to fused transcripts resulting in new proteins with different activity like BCR-ABL in leukemia [1] and EML4-ALK in lung cancer [2]; on the other hand, it can lead to change in a transcript quantity by translocating a strong gene promoter to the intact coding region of an oncogene like TMPRSS2-ERG [3]. Another functional genomics rearrangement is genomic deletion which results in loss of DNA segment that might harbour functional genes. PTEN is a well-studied genomic deletion in prostate cancer that is anticipated to trigger a cascade of genomic rearrangements [4]. Figure 1 gives a schematic description of the four rearrangement types.

Here we argue that microarray gene expression data is a valuable source of information to discover outlier genes with potential functional gene rearrangements that have effect on the expression level of downstream genes. Since gene rearrangements are rare genetic translocation that affects a small sample of cancer patients and not all of them, it is feasible to discover genes that are overexpressed (amplified or fused) or underexpressed (deleted) in subset of cancer samples. Genes that are overexpressed in subset of samples are anticipated to be amplified or fused, and genes that are underexpressed in subset of samples are anticipated to be deleted. Unfortunately methods like SAM, t-test, and so forth that are developed to extract differentially expressed genes are not suitable to detect outlier genes. Previous works that aimed to identify gene rearrangements using bioinformatics approaches were limited to the identification of potential fused genes overexpressed in subset of samples and assessing the performance using synthetic data with embedded test genes. Herein, we followed the same approach by testing our EigFusion method on synthetic data with embedded tests. One might argue that real expression data does not follow certain distributions as in synthetic data. To address this point, we used real prostate cancer data with synthetic tests to test and compare methods. Unfortunately, there is no benchmark data that could be used in this study for performance evaluation purposes.

Discovering cancer rearrangements can ameliorate the dysfunctional components in cancer cells. EigFusion successfully detected outlier genes with potential amplification or deletion genes (rearranged genes) in subset of cancer samples in both prostate and ovarian using gene expression data. EigFusion is the only method that is robust against variations in cancer sample size. Several genes like ERG, FABP5, SPINK1, KCNH8, and PAH are highly associated with outcome data. This set of genes could be used as prognostic biomarkers for prostate cancer. ADIPOQ and LY6H are discovered to be rearranged in 14% and 23% of ovarian samples, respectively. Using CNA to validate the rearranged genes demonstrated that ovarian cancer patients have higher rate of alterations per sample. Most ovarian cancer patients harbour multiple several genes altered. Integrating functional protein networks assisted to reveal the modularity of the rearranged genes. This ameliorates the functional dysfunctional genes as components rather than single genes. Genes with rearrangements helped to identify three prostate cancer subgroups with distinct outcome. Finally, gene expression data is a valuable and widely available source of information to discover gene with potential rearrangements.