Research Article: Identification of marginal causal relationships in gene networks from observational and interventional expression data

Date Published: March 16, 2017

Publisher: Public Library of Science

Author(s): Gilles Monneret, Florence Jaffrézic, Andrea Rau, Tatiana Zerjal, Grégory Nuel, Frank Emmert-Streib.

http://doi.org/10.1371/journal.pone.0171142

Abstract

Causal network inference is an important methodological challenge in biology as well as other areas of application. Although several causal network inference methods have been proposed in recent years, they are typically applicable for only a small number of genes, due to the large number of parameters to be estimated and the limited number of biological replicates available. In this work, we consider the specific case of transcriptomic studies made up of both observational and interventional data in which a single gene of biological interest is knocked out. We focus on a marginal causal estimation approach, based on the framework of Gaussian directed acyclic graphs, to infer causal relationships between the knocked-out gene and a large set of other genes. In a simulation study, we found that our proposed method accurately differentiates between downstream causal relationships and those that are upstream or simply associative. It also enables an estimation of the total causal effects between the gene of interest and the remaining genes. Our method performed very similarly to a classical differential analysis for experiments with a relatively large number of biological replicates, but has the advantage of providing a formal causal interpretation. Our proposed marginal causal approach is computationally efficient and may be applied to several thousands of genes simultaneously. In addition, it may help highlight subsets of genes of interest for a more thorough subsequent causal network inference. The method is implemented in an R package called MarginalCausality (available on GitHub).

Partial Text

Causal network inference is of great interest in systems biology, particularly for transcriptomic studies that aim to identify regulatory relationships among genes, i.e., gene regulatory networks. In the context of probabilistic graphical models, several algorithms have been proposed to infer the skeleton of directed, undirected, or partially-directed graphs using conditional independence tests [1, 2], score-based procedures [3–6] or mutual information [7–10]. These skeletons correspond to an equivalence class, i.e. an indistinguishable subset of graphs. Undirected graphs can be used to obtain a supergraph of the skeleton of a directed graph, which is a good starting point to infer causality when the underlying graph is unknown. Several undirected network inference methods, based on the parsimonious estimation of the inverse covariance matrix, have also been proposed for Gaussian graphical models [11, 12]. Although methods based on mutual information can also be used to infer the full graph of undirected networks [13, 14], estimating causal networks with these algorithms tends to be very computationally demanding and applicable only for low-dimensional networks. In addition, such approaches require a significant amount of interventional data to reduce the space of equivalent networks [15]. However, even with a sufficient amount of interventional data, i.e. roughly one knock-out for each gene, a directed acyclic graph (DAG) cannot generally be accurately estimated [16], perhaps due to the heterogeneous coverage of the gene network space [17]. As such, in this work we focus on estimating a few causal effects rather than attempting to infer the full network [18].

We have proposed a novel approach to detect marginal causal relationships in high dimensional data when interventions are available for a single node of interest. This method was developed in the context of transcriptomic data, and can be particularly useful to perform a pre-selection of genes prior to a more thorough causal network inference. It is computationally efficient and can be simultaneously applied thousands of genes. In addition, our simulation study illustrated that the proposed method was able to accurately classify between downstream causal relationships and upstream or simple correlation relationships when the underlying DAG is unknown.

 

Source:

http://doi.org/10.1371/journal.pone.0171142

 

0 0 vote
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments