Research Article: Transcriptomics technologies

Date Published: May 18, 2017

Publisher: Public Library of Science

Author(s): Rohan Lowe, Neil Shirley, Mark Bleackley, Stephen Dolan, Thomas Shafee

Abstract: Transcriptomics technologies are the techniques used to study an organism’s transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. Here, mRNA serves as a transient intermediary molecule in the information network, whilst noncoding RNAs perform additional diverse functions. A transcriptome captures a snapshot in time of the total transcripts present in a cell.The first attempts to study the whole transcriptome began in the early 1990s, and technological advances since the late 1990s have made transcriptomics a widespread discipline. Transcriptomics has been defined by repeated technological innovations that transform the field. There are two key contemporary techniques in the field: microarrays, which quantify a set of predetermined sequences, and RNA sequencing (RNA-Seq), which uses high-throughput sequencing to capture all sequences.Measuring the expression of an organism’s genes in different tissues, conditions, or time points gives information on how genes are regulated and reveals details of an organism’s biology. It can also help to infer the functions of previously unannotated genes. Transcriptomic analysis has enabled the study of how gene expression changes in different organisms and has been instrumental in the understanding of human disease. An analysis of gene expression in its entirety allows detection of broad coordinated trends which cannot be discerned by more targeted assays.

Partial Text: Transcriptomics has been characterised by the development of new techniques which have redefined what is possible every decade or so and render previous technologies obsolete (Fig 1). The first attempt at capturing a partial human transcriptome was published in 1991 and reported 609 mRNA sequences from the human brain [1]. In 2008, two human transcriptomes composed of millions of transcript-derived sequences covering 16,000 genes were published [2][3], and, by 2015, transcriptomes had been published for hundreds of individuals [4][5]. Transcriptomes of different disease states, tissues, or even single cells are now routinely generated [5][6][7]. This explosion in transcriptomics has been driven by the rapid development of new technologies with an improved sensitivity and economy (Table 1) [8][9][10][11].

Generating data on RNA transcripts can be achieved via either of two main principles: sequencing of individual transcripts (ESTs, or RNA-Seq), or hybridisation of transcripts to an ordered array of nucleotide probes (i.e., microarrays).

Transcriptomics methods are highly parallel and require significant computation to produce meaningful data for both microarray and RNA-Seq experiments. Microarray data are recorded as high-resolution images, requiring feature detection and spectral analysis. Microarray raw image files are each about 750 MB in size, while the processed intensities are around 60 MB in size. Multiple short probes matching a single transcript can reveal details about the intron-exon structure, requiring statistical models to determine the authenticity of the resulting signal. RNA-Seq studies can produce >109 of short DNA sequences, which must be aligned to reference genomes comprised of millions to billions of base pairs. De novo assembly of reads within a dataset requires the construction of highly complex sequence graphs. RNA-Seq operations are highly repetitious and benefit from parallelised computation, but modern algorithms mean consumer computing hardware is sufficient for simple transcriptomics experiments that do not require de novo assembly of reads. A human transcriptome could be accurately captured by using RNA-Seq with 30 million 100 bp sequences per sample [84][85]. This example would require approximately 1.8 gigabytes of disk space per sample when stored in a compressed fastq format. Processed count data for each gene would be much smaller, equivalent to processed microarray intensities. Sequence data may be stored in public repositories, such as the Sequence Read Archive (SRA) [86]. RNA-Seq datasets can be uploaded via the Gene Expression Omnibus.

Transcriptomics studies generate large amounts of data that has potential applications far beyond the original aims of an experiment. As such, raw or processed data may be deposited into public databases to ensure their utility for the broader scientific community (Table 5). For example, as of 2016, the Gene Expression Omnibus contained millions of experiments.

Transcriptomics has revolutionised our understanding of how genomes are expressed. Over the last three decades, new technologies have redefined what is possible to investigate, and integration with other omics technologies is giving an increasingly integrated view of the complexities of cellular life. The plummeting cost of transcriptomics studies have made them possible for small laboratories, and large-scale transcriptomics consortia are able to undertake experiments comparing transcriptomes of thousands of organisms, tissues, or environmental conditions. This trend is likely to continue as sequencing technologies improve.