Date Published: March 30, 2019
Publisher: BioMed Central
Author(s): Morteza Chalabi Hajkarim, Eli Upfal, Fabio Vandin.
We study the problem of identifying differentially mutated subnetworks of a large gene–gene interaction network, that is, subnetworks that display a significant difference in mutation frequency in two sets of cancer samples. We formally define the associated computational problem and show that the problem is NP-hard.
We propose a novel and efficient algorithm, called DAMOKLE, to identify differentially mutated subnetworks given genome-wide mutation data for two sets of cancer samples. We prove that DAMOKLE identifies subnetworks with statistically significant difference in mutation frequency when the data comes from a reasonable generative model, provided enough samples are available.
We test DAMOKLE on simulated and real data, showing that DAMOKLE does indeed find subnetworks with significant differences in mutation frequency and that it provides novel insights into the molecular mechanisms of the disease not revealed by standard methods.
The analysis of molecular measurements from large collections of cancer samples has revolutionized our understanding of the processes leading to a tumour through somatic mutations, changes of the DNA appearing during the lifetime of an individual . One of the most important aspects of cancer revealed by recent large cancer studies is inter-tumour genetic heterogeneity: each tumour presents hundreds-thousands mutations and no two tumours harbour the same set of DNA mutations .
This section presents the problem we study, the algorithm we propose for its solution, and the analysis of our algorithm. In particular, “Computational problem” section formalizes the computational problem we consider; “Algorithm” section presents DifferentiAlly Mutated subnetwOrKs anaLysis in cancEr (DAMOKLE), our algorithm for the solution of the computational problem; “Analysis of DAMOKLE” section describes the analysis of our algorithm under a reasonable generative model for mutations; “Statistical significance of the results” section presents a formal analysis of the statistical significance of subnetworks obtained by DAMOKLE; and “Permutation testing” section describes two permutation tests to assess the significance of the results of DAMOKLE for limited sample sizes.
We implemented DAMOKLE in Python1 and tested it on simulated and on cancer data. Our experiments have been conducted on a Linux machine with 16 cores and 256 GB of RAM. For all our experiments we used as interaction graph G the HINT+HI2012 network2, a combination of the HINT network  and the HI-2012  set of protein–protein interactions, previously used in . In all cases we considered only the subnetwork with the highest differential coverage among the ones returned by DAMOKLE. We first present the results on simulated data (“Simulated data” section) and then present the results on cancer data (“Cancer data” section).
In this work we study the problem of finding subnetworks of a large interaction network with significant difference in mutation frequency in two sets of cancer samples. This problem is extremely important to identify mutated mechanisms that are specific to a cancer (sub)type as well as for the identification of mechanisms related to clinical features (e.g., response to therapy). We provide a formal definition of the problem and show that the associated computational problem is NP-hard. We design, analyze, implement, and test a simple and efficient algorithm, DAMOKLE, which we prove identifies significant subnetworks when enough data from a reasonable generative model for cancer mutations is provided. Our results also show that the subnetworks identified by DAMOKLE cannot be identified by methods not designed for the comparative analysis of mutations in two sets of samples. We tested DAMOKLE on simulated and real data. The results on simulated data show that DAMOKLE identifies significant subnetworks with currently available sample sizes. The results on two large cancer datasets, each comprising genome-wide measurements of DNA mutations in two cancer subtypes, shows that DAMOKLE identifies subnetworks that are not found by methods not designed for the comparative analysis of mutations in two sets of samples.