Research Article: Insights from deconvolution of cell subtype proportions enhance the interpretation of functional genomic data

Date Published: April 25, 2019

Publisher: Public Library of Science

Author(s): Yu Kong, Deepa Rastogi, Cathal Seoighe, John M. Greally, Masako Suzuki, Gualtiero I. Colombo.


Cell subtype proportion variability between samples contributes significantly to the variation of functional genomic properties such as gene expression or DNA methylation. Although the impact of the variation of cell subtype composition on measured genomic quantities is recognized, and some innovative tools have been developed for the analysis of heterogeneous samples, most functional genomics studies using samples with mixed cell types still ignore the influence of cell subtype proportion variation, or just deal with it as a nuisance variable to be eliminated. Here we demonstrate how harvesting information about cell subtype proportions from functional genomics data can provide insights into cellular changes associated with phenotypes. We focused on two types of mixed cell populations, human blood and mouse kidney. Cell type prediction is well developed in the former, but not currently in the latter. Estimating the cellular repertoire is easier when a reference dataset from purified samples of all cell types in the tissue is available, as is the case for blood. However, reference datasets are not available for most other tissues, such as the kidney. In this study, we showed that the proportion of alterations attributable to changes in the cellular composition varies strikingly in the two disorders (asthma and systemic lupus erythematosus), suggesting that the contribution of cell subtype proportion changes to functional genomic properties can be disease-specific. We also showed that a reference dataset from a single-cell RNA-seq study successfully estimated the cell subtype proportions in mouse kidney and allowed us to distinguish altered cell subtype differences between two different knock-out mouse models, both of which had reported a reduced number of glomeruli compared to their wild-type counterparts. These findings demonstrate that testing for changes in cell subtype proportions between conditions can yield important insights in functional genomics studies.

Partial Text

Assays that test genomic function are used to understand the cellular and genetic differences in phenotypes between individuals. In human disease studies, we invariably test samples that are composed of mixed populations of cell subtypes when performing commonly-used functional genomic assays, including gene expression profiling and assays testing DNA methylation. To date, several cell type deconvolution approaches for genome-wide assays have been published, and applied to test for sample heterogeneity in gene expression [1–7] or DNA methylation [8–15], mostly often in studies of tumors or peripheral blood mononuclear cells (PBMCs) [2,6,7,9,12,13,15–17]. The influence of variability in cellular composition between samples on gene expression patterns has been recognized for decades [18], the associations between immune cell infiltration and prognosis of tumor have been well demonstrated [19–25], and innovative approaches to identify cell-intrinsic changes (those not attributable to cell subtype effects) have been developed [9,12,26–28]. Despite this, many studies still omit even passing consideration of the effects of cell subtype proportion when interpreting results of genome-wide assays. Furthermore, when the influence of cell subtype variation is included in the analysis of functional genomics studies, in most cases the cellular heterogeneity is treated as a nuisance variable, confounding the researchers’ ability to identify cell-intrinsic changes. By treating cell proportion variation as a nuisance variable to exclude, we fail to identify potentially interesting tissue compositional differences associated with disease phenotypes.

By using assays that test expression of genes or microRNAs, the methylation of DNA, chromatin states or other indicators of genomic function, we are generally trying to understand the innate characteristics of the cells tested. Such cell-intrinsic changes can reflect responses to environmental perturbations or genetic mutations, and can be used as clues to the pathogenesis of an associated phenotype. We have referred to this as cellular reprogramming [29], the alteration of the molecular characteristics of a canonical cell type. The possibility that cell subtype proportional heterogeneity could be contributing to the variability in the results of the functional genomics assay is not always considered, but when addressed is generally treated as a confounding variable, with the focus on cell-intrinsic changes of functional genomic properties.