Research Article: A robust data-driven genomic signature for idiopathic pulmonary fibrosis with applications for translational model selection

Date Published: April 18, 2019

Publisher: Public Library of Science

Author(s): Ron Ammar, Pitchumani Sivakumar, Gabor Jarai, John Ryan Thompson, Antje Prasse.


Idiopathic pulmonary fibrosis (IPF) is a chronic and progressive lung disease affecting ~5 million people globally. We have constructed an accurate model of IPF disease status using elastic net regularized regression on clinical gene expression data. Leveraging whole transcriptome microarray data from 230 IPF and 89 control samples from Yang et al. (2013), sourced from the Lung Tissue Research Consortium (LTRC) and National Jewish Health (NJH) cohorts, we identify an IPF gene expression signature. We performed optimal feature selection to reduce the number of transcripts required by our model to a parsimonious set of 15. This signature enables our model to accurately separate IPF patients from controls. Our model outperforms existing published models when tested with multiple independent clinical cohorts. Our study underscores the utility of elastic nets for gene signature/panel selection which can be used for the construction of a multianalyte biomarker of disease. We also filter the gene sets used for model input to construct a model reliant on secreted proteins. Using this approach, we identify the preclinical bleomycin rat model that is most congruent with human disease at day 21 post-bleomycin administration, contrasting with earlier timepoints suggested by other studies.

Partial Text

Idiopathic Pulmonary Fibrosis (IPF) is a fatal disease of unknown etiology characterized by scarring of the lung parenchyma resulting in progressive loss of lung function and eventual death [1]. Although two recently approved drugs, pirfenidone and nintedanib, reduce lung function decline in IPF, their efficacy is limited and mechanism of action poorly understood [2–4]. Even though meta analyses of large clinical trials suggest that pirfenidone reduces risk of mortality [5], lung transplant still remains the only option to significantly prolong survival in IPF, suggesting a dire need for new therapies. Development of new drugs for IPF is extremely challenging due to complicated diagnosis, limited disease understanding, lack of robust pre-clinical models predictive of human disease as well as biomarkers of disease progression and drug treatment. Current diagnosis of IPF requires careful integration of radiographic findings (honeycombing and presence of fibroblast foci), lung function (FVC, FEV1 and 6-minute walk test) and clinical data and the rational exclusion of other potentially similar interstitial lung diseases [6]. Often, the disease is diagnosed at an advanced stage when it is refractory to treatment. Therefore, there is a pressing need to develop newer, less-invasive and robust methods to efficiently diagnose IPF and enable early intervention strategies. Transcriptomic and proteomic disease signatures generated from clinically-relevant human samples including tissue and plasma, combined with robust in silico modeling can enable translational disease understanding, diagnosis and stratification of patients for effective drug treatments. Several studies have utilized microarray profiling of IPF-patient derived lung tissue to define genes and/or pathways that are differentially-regulated in comparison to healthy controls or patients with other lung diseases [4,7–9] and define signatures for disease classification. Peripheral blood profiling across small cohorts of patients have also identified potential biomarkers of disease such as MMP1 and MMP7 [10–12].

Given the challenges associated with the diagnosis of IPF and the inaccuracy of clinical prediction tools, it is imperative to explore new methods for diagnosis, classification and patient stratification. We have effectively leveraged microarray data from a large cohort of IPF patients within the LTRC to generate a new computational classifier of IPF disease. Although IPF disease signatures have been described before [9,13,19,20], the strength of our approach is the number of samples used, the unbiased computational model developed to define the signature and the extensive validation across multiple IPF cohorts. Our model outperforms several other previous models based on the near 100% prediction of disease status across multiple validation cohorts. Bauer et al. (2015) described a 12-gene signature identified from about 100 IPF samples compared with control lungs and established the commonality of this signature with that derived from the rat model of bleomycin induced fibrosis at the 7-day time point. Our study complements and extends these findings by developing alternate signatures and establishing congruence with the rat model of bleomycin induced fibrosis. Tissue and peripheral gene/protein expression signatures provide complex information that could be poorly or incompletely understood in the absence of effective computational modeling. Our study identifies a novel 15-gene signature that accurately predicts IPF disease status (Table 3). The signature contains several genes previously not associated with IPF as well as genes such as MMP7 which is a known biomarker for IPF [10,11] and sFRP2, a Wnt-signaling molecule described as a prospective therapeutic target [40]. Notably, MMP7 knockout mice do not develop fibrosis in response to bleomycin treatment [41]. Also, active MMP7 has been detected in IPF lungs but not healthy lungs and has been implicated as a profibrotic metalloprotease [42,43]. Glutathione Peroxidase-3 (GPX3) identified in our signature has been shown to be present in the epithelial lining fluid in the bleomycin-induced fibrosis model and upregulated in IPF [44].