Research Article: Generation and Analysis of Large-Scale Data-Driven Mycobacterium tuberculosis Functional Networks for Drug Target Identification

Date Published: November 29, 2011

Publisher: Hindawi Publishing Corporation

Author(s): Gaston K. Mazandu, Nicola J. Mulder.


Technological developments in large-scale biological experiments, coupled with bioinformatics tools, have opened the doors to computational approaches for the global analysis of whole genomes. This has provided the opportunity to look at genes within their context in the cell. The integration of vast
amounts of data generated by these technologies provides a strategy for identifying potential drug targets
within microbial pathogens, the causative agents of infectious diseases. As proteins are druggable targets,
functional interaction networks between proteins are used to identify proteins essential to the survival,
growth, and virulence of these microbial pathogens. Here we have integrated functional genomics data to
generate functional interaction networks between Mycobacterium tuberculosis proteins and carried out computational analyses to dissect the functional interaction network produced for identifying drug targets
using network topological properties. This study has provided the opportunity to expand the range of potential drug targets and to move towards optimal target-based strategies.

Partial Text

Throughout history, infectious diseases caused by microbial pathogens have had a devastating impact on human morbidity and mortality, and they remain of great concern, even today. With the advance of new high throughput sequencing technologies, there has been an increase in the number of worldwide microbial genome sequencing projects (,, and, which has yielded complete genome sequences of crucial microbial pathogens of humans, animals, and plants. Analyses of these genome sequences have provided valuable insights into the dynamics driving pathogenic mechanisms and numerous virulence factors and have shed light on the targeted organism’s biology [1]. The characteristic features of pathogenic organisms include their ability to colonize a specific host organ or tissue, to adapt to their environment, and to evade the host immune response [2], thus leading to the development of disease, as a result of a delicate and dynamic balance between pathogen and host defence system.

An MTB functional interaction network was built by integrating interaction datasets from the STRING database and additional interaction data derived from sequence similarity and signature, and microarray data. The STRING database [22, 23] integrates known and predicted protein-protein associations derived from high-throughput experimental data, the mining of databases and literature, and from predictions based on genomic analysis for a large number of organisms. Functional interactions from the STRING database are used with confidence scores as defined by the STRING schemes. These include conserved genomic neighbourhood, gene fusion events, phylogenetic profile, or gene cooccurrence across multiple genomes, text mining, experiments, and other databases ( Additional interaction data are derived from protein sequence similarity and signatures, and microarray data. Functional interaction pairs predicted from protein sequence similarity and conserved protein signatures are scored using information theoretic-based approaches which translate into confidence scores for protein conserved features from evolution [30]. We used a random partial least squares regression technique for inferring genes with similar expression profiles from multiple public microarray datasets and generating functional connection scores between proteins [31]. The combined link confidence score between two proteins i and j for an integrated view of all datasets through a unified network as shown in Figure 1 is given by

We have generated an MTB functional interaction network from nine biological data sources, and the summary of number of interactions and confidence scores is shown in Table 1. For each evidence source, functional interaction scores are categorized into three different confidence levels, namely, low, medium, and high confidence. The final row shows the number of interactions in each confidence range for the final combined score. Note that for a given data source, all interactions whose scores are strictly less than 0.3 (<0.3) are considered as low confidence, scores ranging from 0.3 to 0.7 (0.3 ≤ score ≤ 0.7) are classified as medium confidence, and scores greater than 0.7 (>0.7) yield high confidence. Furthermore, the confidence increases when interaction data are integrated into a single network, producing more medium and high confidence links in the last row than when considering only one type of data. To understand the biological organization of the organism from its protein functional network and use this as a means to develop appropriate treatment strategies for the disease, complete knowledge of the network structure and the contribution of each protein to the system’s biological processes are required. To this end, network centrality measures are used to reveal proteins which are potentially crucial to the functioning of the system, thus contributing to the survival of the organism.

In this study, we have produced an MTB functional network and elucidated proteins which are essential to the functioning of the system using the network centrality measures. We showed that proteins contributing to the survival of the bacterial pathogen within the host are potential drug targets and many have previously been identified as such by different methods. These data can be used to enhance the discovery process of new drugs in order to overcome the disease caused by this particular organism, which currently constitutes a public health challenge.

N. J. Mulder generated and supervised the project, and finalized the manuscript. G. K. Mazandu analyzed, designed and implemented the model, and wrote the manuscript. N. J. Mulder and G. K. Mazandu analyzed data, read, approved the final manuscript and N. J. Mulder approved the production of this paper.

The authors declare that they have no conflict of interests.