Date Published: October 12, 2010
Publisher: Public Library of Science
Author(s): Tulika Prakash, Vineet K. Sharma, Naoki Adati, Ritsuko Ozawa, Naveen Kumar, Yuichiro Nishida, Takayoshi Fujikake, Tadayuki Takeda, Todd D. Taylor, Pawel Michalak. http://doi.org/10.1371/journal.pone.0013284
Abstract: From the ENCODE project, it is realized that almost every base of the entire human genome is transcribed. One class of transcripts resulting from this arises from the conjoined gene, which is formed by combining the exons of two or more distinct (parent) genes lying on the same strand of a chromosome. Only a very limited number of such genes are known, and the definition and terminologies used for them are highly variable in the public databases. In this work, we have computationally identified and manually curated 751 conjoined genes (CGs) in the human genome that are supported by at least one mRNA or EST sequence available in the NCBI database. 353 representative CGs, of which 291 (82%) could be confirmed, were subjected to experimental validation using RT-PCR and sequencing methods. We speculate that these genes are arising out of novel functional requirements and are not merely artifacts of transcription, since more than 70% of them are conserved in other vertebrate genomes. The unique splicing patterns exhibited by CGs reveal their possible roles in protein evolution or gene regulation. Novel CGs, for which no transcript is available, could be identified in 80% of randomly selected potential CG forming regions, indicating that their formation is a routine process. Formation of CGs is not only limited to human, as we have also identified 270 CGs in mouse and 227 in drosophila using our approach. Additionally, we propose a novel mechanism for the formation of CGs. Finally, we developed a database, ConjoinG, which contains detailed information about all the CGs (800 in total) identified in the human genome. In summary, our findings reveal new insights about the functionality of CGs in terms of another possible mechanism for gene regulation and genomic evolution and the mechanism leading to their formation.
Partial Text: Eukaryotic transcription is a highly complex process typically accomplished by interaction of several proteins and regulatory sequences at different levels to generate a variety of gene products. The ENCODE project recently uncovered complex patterns of dispersed regulation and pervasive transcription for at least 1% of the human genome . Subsequently, the long-standing conventional definition of a gene is fading and it is now realized that the genome is full of overlapping and other complex transcripts . One such intriguing example is the read-through transcript or conjoined or co-transcribed gene (see Table S1 for a list of alternative and proposed names). A “conjoined gene” (CG) is defined as a gene, which gives rise to transcripts by combining at least part of one exon from each of two or more distinct known (parent) genes which lie on the same chromosome, are in the same orientation, and often (95%) translate independently into different proteins. In some cases, the transcripts formed by CGs are translated to form chimeric or completely novel proteins. Currently, only 34 CGs are described in the NCBI Entrez Gene database, including well-known examples such as TRIM6-TRIM34 and NME1-NME2 (see http://metasystems.riken.jp/conjoing/faqs#ques2 for a complete list). This “lack of annotation” indicates that this is either a rare phenomenon or that this type of gene has not yet been well characterized in the human genome due to the lack of consensus within the genome annotation community. Also, the use of different gene names to address such transcripts compounds the problem of their identification.
The mammalian transcriptome is much more complex than previously thought. Several recent studies suggest that most of the mammalian genome is transcribed, yet thousands of transcripts do not encode for proteins . These non-protein-coding genes, along with some CGs, play a variety of regulatory functional roles. In this analysis we report the identification of 751 CGs, and for the first time experimental confirmation of the existence of 82% (291 out of 353 representatives) of CGs in 16 human tissues. Some of the CGs also overlapped with those identified in other studies (Figure 3), but a large majority of them were uniquely identified by our method only.