Date Published: January 31, 2017
Publisher: Public Library of Science
Author(s): Evgeny Gladilin, Peter Csermely.
Malignant transformation is known to involve substantial rearrangement of the molecular genetic landscape of the cell. A common approach to analysis of these alterations is a reductionist one and consists of finding a compact set of differentially expressed genes or associated signaling pathways. However, due to intrinsic tumor heterogeneity and tissue specificity, biomarkers defined by a small number of genes/pathways exhibit substantial variability. As an alternative to compact differential signatures, global features of genetic cell machinery are conceivable. Global network descriptors suggested in previous works are, however, known to potentially be biased by overrepresentation of interactions between frequently studied genes-proteins. Here, we construct a cellular network of 74538 directional and differential gene expression weighted protein-protein and gene regulatory interactions, and perform graph-theoretical analysis of global human interactome using a novel, degree-independent feature—the normalized total communicability (NTC). We apply this framework to assess differences in total information flow between different cancer (BRCA/COAD/GBM) and non-cancer interactomes. Our experimental results reveal that different cancer interactomes are characterized by significant enhancement of long-range NTC, which arises from circulation of information flow within robustly organized gene subnetworks. Although enhancement of NTC emerges in different cancer types from different genomic profiles, we identified a subset of 90 common genes that are related to elevated NTC in all studied tumors. Our ontological analysis shows that these genes are associated with enhanced cell division, DNA replication, stress response, and other cellular functions and processes typically upregulated in cancer. We conclude that enhancement of long-range NTC manifested in the correlated activity of genes whose tight coordination is required for survival and proliferation of all tumor cells, and, thus, can be seen as a graph-theoretical equivalent to some hallmarks of cancer. The computational framework for differential network analysis presented herein is of potential interest for a wide range of network perturbation problems given by single or multiple gene-protein activation-inhibition.
Clinically relevant, macroscopically detectable tumors are known to exhibit phenotypic and molecular genetic heterogeneity . Despite considerable genetic diversity, different tumor cells manage to maintain common functional capabilities that manifest in hallmarks of cancer . The underlying mechanisms of cancer hallmark maintenance in different tumors with different genomic profiles are not yet well understood. As a consequence of cancer heterogeneity and plasticity, differential signatures defined by a relatively small number of genes-proteins exhibit substantial variability, which complicates the identification of cancer-specific alterations in microarrays and other omics data.
Starting from 74538 directed interactions between 7018 network nodes, NTC matrices of multistep pathways are computed iteratively as described above (Eqs (5)–(10)). Complete lists of weighted and unweighted pairwise interactions (i.e., 1st order adjacency matrices) for tumor/norm, norm/tumor and ‘random expression’ samples are in S3 Table. With increasing pathway lengths, communicability matrices become densely populated. As shown in Fig 3, the occupancy of communicability matrices (i.e., the ratio of non-zero matrix entries to the dimension of the fully occupied matrix 70182) displays a particularly rapid increase from 0.15% to 53% at n = 4 and saturates around 70%. This means that the majority of network nodes are interconnected via n ≥ 4 distant pathways.
Network-based approaches to mining omics data are increasingly popular. However, consistent modeling of biological networks remains a challenging task and requires the consideration of numerous factors whose impact on simulation results is still controversially debated in the literature. These factors include the role of a particular network topology, directionality and size as well as the choice of appropriate gene proximity metrics and numerical scores. In this work, we focused on the construction and evaluation of novel descriptors for measurement of network information flow. We let other issues remain widely unaddressed, assuming that simulation results obtained with different network topologies should be, in general, convergent.