Date Published: July 3, 2017
Publisher: Public Library of Science
Author(s): Chengchen Zhao, Sheng’en Hu, Xiao Huo, Yong Zhang, Zhaohui Qin.
An increasing number of single cell transcriptome and epigenome technologies, including single cell ATAC-seq (scATAC-seq), have been recently developed as powerful tools to analyze the features of many individual cells simultaneously. However, the methods and software were designed for one certain data type and only for single cell transcriptome data. A systematic approach for epigenome data and multiple types of transcriptome data is needed to control data quality and to perform cell-to-cell heterogeneity analysis on these ultra-high-dimensional transcriptome and epigenome datasets. Here we developed Dr.seq2, a Quality Control (QC) and analysis pipeline for multiple types of single cell transcriptome and epigenome data, including scATAC-seq and Drop-ChIP data. Application of this pipeline provides four groups of QC measurements and different analyses, including cell heterogeneity analysis. Dr.seq2 produced reliable results on published single cell transcriptome and epigenome datasets. Overall, Dr.seq2 is a systematic and comprehensive QC and analysis pipeline designed for parallel single cell transcriptome and epigenome data. Dr.seq2 is freely available at: http://www.tongji.edu.cn/~zhanglab/drseq2/ and https://github.com/ChengchenZhao/DrSeq2.
To better understand cell-to-cell variability, an increasing number of transcriptome technologies, such as Drop-seq [1, 2], Cyto-seq , 10x genomics , MARS-seq , and epigenome technologies, such as Drop-ChIP , single cell ATAC-seq (scATAC-seq) , have been developed in recent years. These technologies can easily provide a large amount of single cell transcriptome information or epigenome information at minimal cost, which makes it possible to perform analysis of cell heterogeneity on the transcriptome and epigenome levels, deconstruction of a cell population, and detection of rare cell populations. However, different single cell transcriptome technologies have their own features given their specific experimental design, such as cell sorting methods, RNA capture rates, and sequencing depths. But the methods and software such as Dr.seq  were developed for one single cell data type with certain functions (S1 File). Furthermore, the quality control step of single cell epigenome data is more challenging than for transcriptome data given the amplification noise caused by the limit number of DNA copy in single cell epigenome experiments. But few quality control and analysis method was developed specific for single cell epigenome data. Thus a comprehensive QC pipeline suitable for multiple types of single cell transcriptome data and epigenome data is urgently needed. Here, we provide Dr.seq2, a QC and analysis pipeline for multiple types of parallel single cell transcriptome and epigenome data, including recently published scATAC-seq data. Dr.seq2 can systematically generate specific QC, analyze, and visualize unsupervised cell clustering for multiple types of single cell data. For single cell transcriptome data, the QC steps of Dr.seq2 are primarily derived from Dr.seq  and the output of Dr.seq2 on these data will not be described in details in this paper.
In summary, Dr.seq2 is designed for QC and analysis components of parallel single cell transcriptome and epigenome data. Parallel single cell transcriptome data generated by different technologies can be transformed to the standard input for Dr.seq2 with contained functions. Using relevant commands, Dr.seq2 can also be used to report quality measurements based on four aspects and generate detailed analysis results for scATAC-seq and Drop-ChIP datasets.