Date Published: February 9, 2017
Publisher: Public Library of Science
Author(s): Jeffrey R. Kugelman, Michael R. Wiley, Elyse R. Nagle, Daniel Reyes, Brad P. Pfeffer, Jens H. Kuhn, Mariano Sanchez-Lockhart, Gustavo F. Palacios, Kok Keng Tee.
Individual RNA viruses typically occur as populations of genomes that differ slightly from each other due to mutations introduced by the error-prone viral polymerase. Understanding the variability of RNA virus genome populations is critical for understanding virus evolution because individual mutant genomes may gain evolutionary selective advantages and give rise to dominant subpopulations, possibly even leading to the emergence of viruses resistant to medical countermeasures. Reverse transcription of virus genome populations followed by next-generation sequencing is the only available method to characterize variation for RNA viruses. However, both steps may lead to the introduction of artificial mutations, thereby skewing the data. To better understand how such errors are introduced during sample preparation, we determined and compared error baseline rates of five different sample preparation methods by analyzing in vitro transcribed Ebola virus RNA from an artificial plasmid-based system. These methods included: shotgun sequencing from plasmid DNA or in vitro transcribed RNA as a basic “no amplification” method, amplicon sequencing from the plasmid DNA or in vitro transcribed RNA as a “targeted” amplification method, sequence-independent single-primer amplification (SISPA) as a “random” amplification method, rolling circle reverse transcription sequencing (CirSeq) as an advanced “no amplification” method, and Illumina TruSeq RNA Access as a “targeted” enrichment method. The measured error frequencies indicate that RNA Access offers the best tradeoff between sensitivity and sample preparation error (1.4−5) of all compared methods.
Describing the genomic composition of intra-host virus populations is becoming crucial for understanding disease progression, determining the effect of immune pressure on evolution of viral genotypes and phenotypes, optimizing vaccine design, and identifying virus genome mutations that may lead to resistance against medical countermeasures [1–5]. The characterization of viral genomic populations is especially important for RNA viruses, which due to their relatively short genomes have short replication times. Due to these short replication cycles and the error-prone viral RNA-dependent RNA polymerase, RNA viruses typically are subject to extreme evolutionary dynamics with very high mutation rates, thereby leading to large population sizes .
We compared sequencing errors due to RNA virus nucleic acid amplification and enrichment using different sample preparation methods. In particular, we analyzed the type and origin of errors after analysis of seven different libraries using two independent preparations (Fig 1).
This study presents the deep sequencing results of a reporting system designed to describe acquisition of subclonal diversity resulting from sample preparation techniques commonly used to describe viral populations. Methods utilized in our viral re-sequencing pipeline software VSALIGN were designed to limit the originating diversity present in the plasmid so that acquired diversity (sample preparation error) can be tracked throughout. These methods were used to evaluate five different direct sequencing and amplification preparation techniques. The data obtained here is considered crucial for regulatory purposes, as low error rates of viral population analysis methods are necessary for the study and detection of viral adaptation during antiviral treatment.