Date Published: July 19, 2017
Publisher: Public Library of Science
Author(s): Larry Xi, Alexander Belyaev, Sandra Spurgeon, Xiaohui Wang, Haibiao Gong, Robert Aboukhalil, Richard Fekete, Ruslan Kalendar.
A central challenge in sequencing single-cell genomes is the accurate determination of point mutations, phasing of these mutations, and identifying copy number variations with few assumptions. Ideally, this is accomplished under as low sequencing coverage as possible. Here we report our attempt to meet these goals with a novel library construction and library amplification methodology. In our approach, single-cell genomic DNA is first fragmented with saturated transposition to make a primary library that uniformly covers the whole genome by short fragments. The library is then amplified by a carefully optimized PCR protocol in a uniform and synchronized fashion for next-generation sequencing. Each step of the protocol can be quantitatively characterized. Our shallow sequencing data show that the library is tightly distributed and is useful for the determination of copy number variations.
The genetic variations that can occur in single cells, such as single-nucleotide variations (SNVs) and copy number variations (CNVs) are the driving forces in many biological processes, including evolution and cancer . Most of the current studies on genetic variations rely on bulk DNA sequencing, which only provides a coarse view into the average state of a population of cells. Although bulk sequencing provides an adequate picture for studies at the germline level or for homogeneous systems, it works poorly for systems such as solid tumors, which are complex mixtures of cells that can include noncancerous fibroblasts, endothelial cells, lymphocytes, and macrophages. The noncancerous cells can contribute more than 50% of the total DNA extracted, potentially masking important aberrations from the cancer cells . In addition, the heterogeneity of cancerous cells within tumors and the myriad of genome instability processes that shape tumor evolution over space and time cannot be resolved by bulk sequencing [3, 4]. In contrast, single-cell approaches using next-generation sequencing (NGS) have yielded important insights into the key genomic features of various subpopulations and the evolutions of various cancer clones .
TnBC library construction involves two steps: saturated transposition and unbiased PCR amplification. The library is quality controlled by Ct, product profile, duplication rate, distribution of CN state, and Lorenz curves. The shallow sequence of the library can be used for revelation of CNV and for a QC step before a commitment to deep sequencing. With optimized protocols, we will apply the methodology to comparison of cancer cells and normal cells. In an ideal TnBC library, each haploid from a single cell will generate one unique set of contiguous fragments. Since each fragment from a set has a unique combination of 5ʹ and 3ʹ ends in the coordinates of the reference genome, two alleles within a cell can be distinguished by their unique 5ʹ and 3ʹ ends combinations, i.e. UFIs (Fig 1). UFIs not only provide barcoding at the individual fragment level, but they also have the potential to be used to link neighboring fragments. When fragments are amplified, the abundance of fragments may vary, but the UFIs remain unchanged (Fig 1). Any variation in abundance of fragments can be normalized using UFIs. With deep sequencing, we expect to detect SNVs and absolute copy number information.