Research Article: An Overview of the Statistical Methods Used for Inferring Gene Regulatory Networks and Protein-Protein Interaction Networks

Date Published: February 21, 2013

Publisher: Hindawi Publishing Corporation

Author(s): Amina Noor, Erchin Serpedin, Mohamed Nounou, Hazem Nounou, Nady Mohamed, Lotfi Chouchane.


The large influx of data from high-throughput genomic and proteomic technologies has encouraged the researchers to seek approaches for understanding the structure of gene regulatory networks and proteomic networks. This work reviews some of the most important statistical methods used for modeling of gene regulatory networks (GRNs) and protein-protein interaction (PPI) networks. The paper focuses on the recent advances in the statistical graphical modeling techniques, state-space representation models, and information theoretic methods that were proposed for inferring the topology of GRNs. It appears that the problem of inferring the structure of PPI networks is quite different from that of GRNs. Clustering and probabilistic graphical modeling techniques are of prime importance in the statistical inference of PPI networks, and some of the recent approaches using these techniques are also reviewed in this paper. Performance evaluation criteria for the approaches used for modeling GRNs and PPI networks are also discussed.

Partial Text

Postgenomic era is marked by the availability of a deluge of genomic data and has, thus, enabled the researchers to look towards new dimensions for understanding the complex biological processes governing the life of a living organism [1–5]. The various life sustaining functions are performed via a collaborative effort involving DNA, RNA, and proteins. Genes and proteins interact with themselves and each other and orchestrate the successful completion of a multitude of important tasks. Understanding how they work together to form a cellular network in a living organism is extremely important in the field of molecular biology. Two important problems in this considerably nascent field of computational biology are the inference of gene regulatory networks and the inference of protein-protein interaction networks. This paper first looks at how the genes and proteins interact with themselves and then discusses the inference of an integrative cellular network of genes and proteins combined.

The postgenomic era is distinguished by the availability of huge amount of biological data sets which are quite heterogenous in nature and difficult to analyze [3]. It is expected that these data sets can aid in obtaining useful knowledge about the underlying interactions in gene-gene and protein-protein networks. This section reviews some of the main types of data used for the inference of genomic and proteomic networks, including, gene expression data, protein-protein interaction data, and ChIP-chip data.

Gene regulatory networks capture the interactions present among the genes. Accurate and reliable estimation of gene networks is significantly crucial and can reap far-reaching benefits in the field of medicinal biology, for example, in terms of developing personalized medicines. The following subsections review the main statistical methods used for inference of gene regulatory networks. First, the important class of probabilistic graphical models is presented.

Having examined the gene network inference problem, this section describes the statistical methods that are used to find reliable and complete protein-protein interaction networks. As opposed to gene networks which are mostly inferred using the expression data or the likes of it, inference of PPI networks can be carried out in various ways such as phylogenetic profiling and identification of structural patterns. This paper focuses only on the methods that employ PPI data to make inference. The given data in this scenario are the protein-protein interactions. However, such data sets consist of a large number of false positives and negatives and are far from being complete and homogeneous. Therefore, only a small overlap is found between the PPI data sets obtained from various sources. However, it is observed that the interactions predicted by more than one method are more reliable [37]. One of the challenges is the large number of interactions indicated by the PPI data as opposed to the considerably fewer interactions assumed to be present in reality. Therefore, the problem in this scenario is to find more reliable interactions and predict the yet unknown interactions. In addition, the protein interactions can be of different types ranging from stable ones to transient ones [37].

The advances in reverse engineering of GRNs and PPI networks have paved the way for joint estimation of GRNs and PPI networks [41]. This is a step towards the inference of an integrated network consisting of genes, proteins, and transcription factors, indicating interactions among themselves and each other. Figure 6 shows the schematic of an integrated cellular network. In this section, we review two important ways of estimating a joint network.

The inference accuracy can be assessed using the knowledge of a gold-standard network or the true network. In order to benchmark the algorithms, the correctly identified edges or true positives (TPs) need to be calculated. In addition, the number of false positives (FPs), or the edges incorrectly indicated to be present, and false negatives (FNs) which is the missed detection should also be counted [10]. With these values in hand, true positive rate or recall; that is, TPR = TP/(TP + FN), false positive rate; that is, FPR = FP/(FP + TN), and positive predictive value; that is, PPV = TP/(TP + FP), also called the precision, can be calculated. These quantities enable us to view the performance graphically by the area under the ROC curve which plots FPR versus the TPR. These criteria are most widely used as the fidelity criterion for gene network inference algorithms.

This paper reviews the main statistical methods used for inference of gene and protein-protein networks. PPI network inference can be carried out in a wide variety of ways by exploiting phylogenetics information and sequencing data. This paper focused only on those inference methods that employ PPI data.




Leave a Reply

Your email address will not be published.