Date Published: February 27, 2015
Publisher: Public Library of Science
Author(s): J. Harry Caufield, Marco Abreu, Christopher Wimble, Peter Uetz, Christine A. Orengo
Abstract: Large-scale analyses of protein complexes have recently become available for Escherichia coli and Mycoplasma pneumoniae, yielding 443 and 116 heteromultimeric soluble protein complexes, respectively. We have coupled the results of these mass spectrometry-characterized protein complexes with the 285 “gold standard” protein complexes identified by EcoCyc. A comparison with databases of gene orthology, conservation, and essentiality identified proteins conserved or lost in complexes of other species. For instance, of 285 “gold standard” protein complexes in E. coli, less than 10% are fully conserved among a set of 7 distantly-related bacterial “model” species. Complex conservation follows one of three models: well-conserved complexes, complexes with a conserved core, and complexes with partial conservation but no conserved core. Expanding the comparison to 894 distinct bacterial genomes illustrates fractional conservation and the limits of co-conservation among components of protein complexes: just 14 out of 285 model protein complexes are perfectly conserved across 95% of the genomes used, yet we predict more than 180 may be partially conserved across at least half of the genomes. No clear relationship between gene essentiality and protein complex conservation is observed, as even poorly conserved complexes contain a significant number of essential proteins. Finally, we identify 183 complexes containing well-conserved components and uncharacterized proteins which will be interesting targets for future experimental studies.
Partial Text: Abundant genome sequencing revealed an astounding diversity among bacterial genomes. Even species that inhabit the same environment may only share a fraction of their genes. This raises the question how these organisms have adapted to their environments using only a limited number of genes. Here, we investigate the protein complements across bacterial genomes, how proteins are combined into protein complexes across species, and whether these complexes have been conserved across diverse branches on the prokaryotic tree of life.
The substantial variation among protein complexes across species supports the notion that these complexes are much more malleable than previously thought. A possible explanation of this is that the function of a complex is more important than its content. Complexes can serve the same role yet contain different proteins and when one function is lost, others can fill in the gap. Other studies have found that functional redundancy can lead to variation and that there is little overlap in terms of protein interaction among species [2,3]. While mutational change in a protein complex may have catastrophic potential, complexes are not immutable. In fact, several complexes that are essential in some species have varying composition in other species. For instance, 5 out of 9 components of the E. coli Sec translocation complex (EcoCyc: SEC-SECRETION-CPLX) are well-conserved across species from P. aeruginosa to M. genitalium. One of these components, SecA, has been found to be essential in all species focused on in this work with the exception of S. sanguinis; orthologs of this protein are present in all 894 bacterial genomes examined. The remaining 4 E. coli components are more variable in conservation across species. For instance, YajC is present in 727 out of the same 894 genomes. Strong selection pressure seems to avoid mutations that render the entire complex ineffectual. This may explain why we have observed a higher level of conservation for protein complex components than for proteins in general (Fig. 1).
All data management was performed using in-house Python scripts (SPICEDNOG; available at http://github.com/caufieldjh/spicednog). Statistical analysis and clustering was performed using R package vegan .