Research Article: A sequential Monte Carlo algorithm for inference of subclonal structure in cancer

Date Published: January 25, 2019

Publisher: Public Library of Science

Author(s): Oyetunji E. Ogundijo, Kaiyi Zhu, Xiaodong Wang, Dimitris Anastassiou, Xiang Li.


Tumors are heterogeneous in the sense that they consist of multiple subpopulations of cells, referred to as subclones, each of which is characterized by a distinct profile of genomic variations such as somatic mutations. Inferring the underlying clonal landscape has become an important topic in that it can help in understanding cancer development and progression, and thereby help in improving treatment. We describe a novel state-space model, based on the feature allocation framework and an efficient sequential Monte Carlo (SMC) algorithm, using the somatic mutation data obtained from tumor samples to estimate the number of subclones, as well as their characterization. Our approach, by design, is capable of handling any number of mutations. Via extensive simulations, our method exhibits high accuracy, in most cases, and compares favorably with existing methods. Moreover, we demonstrated the validity of our method through analyzing real tumor samples from patients from multiple cancer types (breast, prostate, and lung). Our results reveal driver mutation events specific to cancer types, and indicate clonal expansion by manual phylogenetic analysis. MATLAB code and datasets are available to download at:

Partial Text

In most cases, tumors develop from a single population of cells. Accumulated somatic mutations confer selective advantages to the cells in this population over others [1], and then this population of cells continues to proliferate. As more somatic mutations are acquired, some tumor cells gain further survival advantages, which leads to an expansion from a single population to multiple subpopulations. As a result, tumors are heterogeneous in nature [2, 3] and contain multiple subpopulations of cancerous cells, each with a unique mutational profile [4–6], referred to as tumor subclones [2, 7, 8]. The importance of analyzing the tumor subclonal structure and evolutionary progress has been recognized, considering the potential of elucidating the underlying mechanisms of cancer progression, metastatic spread and therapy response [9–11].

The inherent heterogeneity in tumor samples often results in setbacks when cancer patients undergo treatment. The samples consist of different subpopulations of cancerous cells, each characterized by a distinct mutational profile. Inference of these profiles and the proportion of each subpopulation in the samples can improve personalized medicine e.g. preventing cancer relapse and helping in cancer prognosis. We proposed an efficient sequential algorithm for estimating the mutational profile of each cancer cell subpopulation and their respective proportions in the tumor samples. With simulated datasets, we performed experiments to validate our algorithm. We applied our algorithm to real tumor samples, covering three solid cancer types, PRAD, IDC, and LUAD.




Leave a Reply

Your email address will not be published.