Date Published: November 1, 2018
Publisher: Public Library of Science
Author(s): Amir Saberi, Anastasia A. Gulyaeva, John L. Brubacher, Phillip A. Newmark, Alexander E. Gorbalenya, Stanley Perlman.
RNA viruses are the only known RNA-protein (RNP) entities capable of autonomous replication (albeit within a permissive host environment). A 33.5 kilobase (kb) nidovirus has been considered close to the upper size limit for such entities; conversely, the minimal cellular DNA genome is in the 100–300 kb range. This large difference presents a daunting gap for the transition from primordial RNP to contemporary DNA-RNP-based life. Whether or not RNA viruses represent transitional steps towards DNA-based life, studies of larger RNA viruses advance our understanding of the size constraints on RNP entities and the role of genome size in virus adaptation. For example, emergence of the largest previously known RNA genomes (20–34 kb in positive-stranded nidoviruses, including coronaviruses) is associated with the acquisition of a proofreading exoribonuclease (ExoN) encoded in the open reading frame 1b (ORF1b) in a monophyletic subset of nidoviruses. However, apparent constraints on the size of ORF1b, which encodes this and other key replicative enzymes, have been hypothesized to limit further expansion of these viral RNA genomes. Here, we characterize a novel nidovirus (planarian secretory cell nidovirus; PSCNV) whose disproportionately large ORF1b-like region including unannotated domains, and overall 41.1-kb genome, substantially extend the presumed limits on RNA genome size. This genome encodes a predicted 13,556-aa polyprotein in an unconventional single ORF, yet retains canonical nidoviral genome organization and expression, as well as key replicative domains. These domains may include functionally relevant substitutions rarely or never before observed in highly conserved sites of RdRp, NiRAN, ExoN and 3CLpro. Our evolutionary analysis suggests that PSCNV diverged early from multi-ORF nidoviruses, and acquired additional genes, including those typical of large DNA viruses or hosts, e.g. Ankyrin and Fibronectin type II, which might modulate virus-host interactions. PSCNV’s greatly expanded genome, proteomic complexity, and unique features–impressive in themselves–attest to the likelihood of still-larger RNA genomes awaiting discovery.
Radiation of primitive life as it took hold on earth was likely accompanied by genome expansion, which was associated with increased complexity and a proposed progression from RNA-based through RNA-protein to DNA-based life . The feasibility of an autonomous ancient RNA genome, and the mechanisms underlying such fateful transitions, are challenging to reconstruct. It is especially unclear whether RNA entities ever evolved genomes close to the 100–300 kilobase (kb) range [2, 3] of the “minimal” reconstructed cellular DNA genome . This range overlaps with the upper size limit of nuclear pre-mRNAs , which is likely the upper size limit for functional RNAs due to the relative chemical lability of RNA compared to DNA. However, pre-mRNAs are incapable of self-replication, the defining property of primordial genomic RNAs.
The advent of metagenomics and transcriptomics has greatly accelerated the pace of virus discovery, leading to studies reporting genome sequences of dozens to thousands of new RNA viruses in poorly characterized hosts [35, 36, 79, 120–126]. These developments have substantially advanced our appreciation of RNA virus diversity, and improved our understanding of the mechanisms of its generation [127, 128]. Notwithstanding that sea change, the largest known RNA genomes continue to belong to nidoviruses, as has been the case for 30 years, since the first coronavirus genome of 27 kb was sequenced [14, 21, 78] (Fig 1A).
Bioinformatics Materials and Methods are described in S1 Materials and Methods in detail.