Date Published: January 31, 2017
Publisher: Public Library of Science
Author(s): Alexander K. Hartmann, Grégory Nuel, Rongling Wu.
Triplet ordering preferences are used to perform Monte Carlo sampling of the posterior causal orderings originating from the analysis of gene-expression experiments involving observation as well as, usually few, interventions, like knock-outs. The performance of this sampling approach is compared to a previously used sampling via pairwise ordering preference as well as to the sampling of the full posterior distribution. For a fair comparison, the latter approach is restricted to twice the numerical effort of the triplet-based approach. This is done for artificially generated causal, i.e., directed acyclic graphs (DAGs) and for actual experimental data taken from the ROSETTA challenge. The sampling using the triplets ordering turns out to be superior to both other approaches.
For the last 10 years, high-throughput omics data have raised many methodological challenges in system biology. Among these challenges, gene-regulation networks have received a great deal of attention. In this context, Gaussian models like the Graphical lasso  or approaches based on mutual information  are very popular for inferring gene regulation networks. In case time-resolved data is available, e.g., dynamic Bayesian networks  or ordinary differential equations  can be applied. Another popular approach, following the work of Pearl , focuses on causal Gaussian Bayesian networks and performs intervention calculus  proving itself to be able to retrieve bounds on causal effects and thus to partially determine causal relationships using only observational data . In this paper we focus on estimating causal Bayesian networks in the presence of arbitrary mixtures of (non-time resolved) observational and interventional data [8, 9], i.e., wild-types and knock-out/down experiments with possibly multiple interventions within each experiment.
To evaluate and compare the power of the Babington-Smith pair and triple approaches, we applied them to various data obtained from DAG ensembles of different graph sizes as well as to data obtained from biological applications.
To summarize, we studied the estimation of causal orderings and corresponding parameters in sampled data using interventions. In particular we compared pairwise Babington-Smith sampling, which was discussed before  with triplet-wise sampling which we introduced in this work. All results show a much better performance for the triplet sampling approach. When limiting the numerical effort to about two times the running time of the triplet sampling, a sampling using the full maximum likelihood turned out to be much worse than both pair- and triplet-wise sampling.