Date Published: October 17, 2018
Publisher: Public Library of Science
Author(s): Amin Mahpour, Benjamin S. Scruggs, Dominic Smiraglia, Toru Ouchi, Irwin H. Gelman, Tamar Juven-Gershon.
How TATA-less promoters such as those within CpG islands (CGI) control gene expression is still a subject of active research. Here, we have identified the “CGCG element”, a ten-base pair motif with a consensus sequence of TCTCGCGAGA present in a group of promoter-associated CGI-enriched in ribosomal protein and housekeeping genes. This element is evolutionarily conserved in vertebrates, found in DNase-accessible regions and employs RNA Pol II to activate gene expression. Through analysis of capped-nascent transcripts and supporting evidence from reporter assays, we demonstrate that this element activates bidirectional transcription through divergent start sites. Methylation of this element abrogates the associated promoter activity. When coincident with a TATA-box, directional transcription remains CGCG-dependent. Because the CGCG element is sufficient to drive transcription, we propose that its unmethylated form functions as a heretofore undescribed promoter element of a group of TATA-less CGI-associated promoters.
Gene expression is one of the most critical, yet enigmatic, biological processes that defines cellular and organismal identity, and that mediates cellular response to internal and external stimuli . Importantly, dysregulation of this process is known to contribute to various human diseases such as cancer . With the discovery of RNA polymerases, the mechanisms of how transcription occurs have been extensively studied in many organisms . In contrast to the relatively simple prokaryotic transcriptional system, metazoan transcription is considerably more elaborate and involves complicated promoter structures, multiple functional DNA elements and a repertoire of specific general transcription factors. These factors and DNA elements are required to facilitate accurate transcriptional initiation, elongation, and termination [4–6].
In this study, we identify a novel promoter element that drives bidirectional transcription mainly in the context of TATA-less promoters. Although in previous studies, sequences similar to this element were found in the promoter of individual genes, the functional role of the CGCG element in CGIs and TATA-less promoters in the human genome was never explored [22–25]. Whereas other promoter elements (e.g. TATA and GC boxes) require an activator binding site to initiate directional transcription , a single instance of the CGCG element is both necessary and sufficient to promote bidirectional transcription. However, in comparison to other known promoter elements that induce transcription, which typically occur once in most promoters, CGCG elements occur in multiple copies in a small percentage of CGI-containing promoters, a phenomenon that likely influences RNA polymerase recruitment and subsequent transcriptional rates.
In this study, we provide strong evidence that CGCG elements are evolutionarily conserved in vertebrates, functioning as an active component of CGI-associated promoters. The unmethylated form of the element may be sufficient to drive bidirectional transcription of TATA-less promoters. An interesting, yet very important question to address in future studies is whether the CGCG element functions as a core promoter element or as a sequence-specific transcription factor binding site (SSTFBS). An argument for the core promoter element characteristic of the CGCG element includes its ability to initiate local de novo bidirectional transcription in the absence of a core promoter element from nearby Py/Pu(+1) sites. This notion is supported by studies that show that SSTFBSs cannot initiate transcription in the absence of a core promoter element. However, similar to SSTFBSs, CGCG elements can occur in tandem copies and the copy number modulates transcription intensity. Unlike traditional core promoter elements, the CGCG element is not positioned at a fixed distance to TSSs, but it is found within a range of 20 to 70 nucleotides (with a peak of apporoximatly 50 nucleotides) upstream of TSS on either strands as was determined by the analysis of Start-seq and GRO-Cap datasets. It is also conceivable that CGCG elements modulate the induction of transcription by recruiting factors that could affect nucleosomal positioning in CGIs. Identification of the CGCG element interacting factor will likely clarify whether the CGCG element functions as a core promoter element and its role in driving transcription of housekeeping genes from CpG-rich and TATA-less promoters.