Date Published: May 22, 2019
Publisher: Public Library of Science
Author(s): Jimmy Omony, Anne de Jong, Jan Kok, Sacha A. F. T. van Hijum, Arun K. Bhunia.
Lactic acid bacteria are Gram-positive bacteria used throughout the world in many industrial applications for their acidification, flavor and texture formation attributes. One of the species, Lactococcus lactis, is employed for the production of fermented milk products like cheese, buttermilk and quark. It ferments lactose to lactic acid and, thus, helps improve the shelf life of the products. Many physiological and transcriptome studies have been performed in L. lactis in order to comprehend and improve its biotechnological assets. Using large amounts of transcriptome data to understand and predict the behavior of biological processes in bacterial or other cell types is a complex task. Gene networks enable predicting gene behavior and function in the context of transcriptionally linked processes. We reconstruct and present the gene co-expression network (GCN) for the most widely studied L. lactis strain, MG1363, using publicly available transcriptome data. Several methods exist to generate and judge the quality of GCNs. Different reconstruction methods lead to networks with varying structural properties, consequently altering gene clusters. We compared the structural properties of the MG1363 GCNs generated by five methods, namely Pearson correlation, Spearman correlation, GeneNet, Weighted Gene Co-expression Network Analysis (WGCNA), and Sparse PArtial Correlation Estimation (SPACE). Using SPACE, we generated an L. lactis MG1363 GCN and assessed its quality using modularity and structural and biological criteria. The L. lactis MG1363 GCN has structural properties similar to those of the gold-standard networks of Escherichia coli K-12 and Bacillus subtilis 168. We showcase that the network can be used to mine for genes with similar expression profiles that are also generally linked to the same biological process.
Lactococcus lactis MG1363 is a worldwide studied plasmid-free derivative of the dairy starter strain NCDO712 . Several genomes of L. lactis strains, including MG1363, have been sequenced to completion [2–4] and many regulons of L. lactis MG1363 are well characterized [5,6]. Still, the functions of many genes in its genome remain poorly understood. Reliable prediction and assignment of gene function remains a challenge deeply rooted in computational biological methods such as gene annotation and comparative genomics. Another option for gene prediction and function assignment is to construct gene co-expression networks (GCNs) [7–9]. A GCN is a graphical structure consisting of genes (depicted as nodes) and co-expression relationships, depicted as edges. The most connected nodes are the hubs, which generally correspond to genes encoding transcription factors (TFs) that drive the expression of the genes to which they are connected. Co-expression networks are used to characterize gene neighborhood relationships (commonly referred to as guilt-by-association) , which can be used to identify genes/proteins with similar functions and/or physical interactions . A biologically meaningful network should be highly structurally organized, with clusters of genes (or modules) and genes connecting those clusters [12–15].
We have reconstructed and benchmarked the L. lactis MG1363 GCN using in-house and literature-derived transcriptome data. By analyzing the performance of five network reconstruction methods, namely Pearson correlation, Spearman correlation, WGCNA, GeneNet and SPACE, the latter was shown to yield the best network for L. lactis MG1363, both by looking at the structure of the network and at the biological content of the modules. The differences in network structure and corresponding parameters are attributed to the methods for computing the network adjacency matrices. Functional analyses demonstrated that the obtained network modules have biological relevance. Examination of the L. lactis MG1363 GCN shows that some regulons are not members of the same module, an indication that genes in such regulons are regulated by multiple transcription factors also in this organism. A list of differentially expressed genes obtained by DNA microarraying or RNA sequencing, or proteins acquired through proteomics experiments, can be projected on the L. lactis MG1363 GCN in order to uncover gene/protein function.