Date Published: January 13, 2010
Publisher: Public Library of Science
Author(s): Heike Lux, Heiko Flammann, Mathias Hafner, Andreas Lux, Hiroaki Matsunami. http://doi.org/10.1371/journal.pone.0008686
Abstract: The paternally expressed gene PEG10 is a retrotransposon derived gene adapted through mammalian evolution located on human chromosome 7q21. PEG10 codes for at least two proteins, PEG10-RF1 and PEG10-RF1/2, by -1 frameshift translation. Overexpression or reinduced PEG10 expression was seen in malignancies, like hepatocellular carcinoma or B-cell acute and chronic lymphocytic leukemia. PEG10 was also shown to promote adipocyte differentiation. Experimental evidence suggests that the PEG10-RF1 protein is an inhibitor of apoptosis and mediates cell proliferation. Here we present new data on the genomic organization of PEG10 by identifying the major transcription start site, a new splice variant and report the cloning and analysis of 1.9 kb of the PEG10 promoter. Furthermore, we show for the first time that PEG10 translation is initiated at a non-AUG start codon upstream of the previously predicted AUG codon as well as at the AUG codon. The finding that PEG10 translation is initiated at different sides adds a new aspect to the already interesting feature of PEG10’s −1 frameshift translation mechanism. It is now important to unravel the cellular functions of the PEG10 protein variants and how they are related to normal or pathological conditions. The generated promoter-reporter constructs can be used for future studies to investigate how PEG10 expression is regulated. In summary, our study provides new data on the genomic organization as well as expression and translation of PEG10, a prerequisite in order to study and understand the role of PEG10 in cancer, embryonic development and normal cell homeostasis.
Partial Text: In 2004, the International Human Genome Sequencing Consortium published an analysis and annotation of the nearly completed human genome sequence (∼99%) with an estimated number of 20.000–25.000 protein coding genes . Despite this surprisingly low number of coding genes the complexity of the proteome is generated in part by alternative splicing. Alternative splicing gives rise to a varying number of mRNAs coding for a set of one to several differentially assembled proteins originating from one gene. The imprinted human gene “Paternally Expressed Gene” 10 (PEG10) and its mouse ortholog Peg10/Edr use a different mechanism coding for more than one protein by −1 ribosomal frameshift translation , , which is well known from retroviruses and retrotransposons . PEG10 and the human paraneoplastic antigen gene MA3 are to our knowledge currently the only two human genes known to use this mechanism.
Recent reports suggest that PEG10 has important functions in cell proliferation, differentiation, apoptosis and the development of cancer. However, it is not well known how the PEG10 proteins influence these functions and how PEG10 expression and translation is regulated. In order to understand the molecular mechanism of transcription of a gene, it is essential to identify the corresponding promoter. Determination of the transcription start site (TSS) is the first step in identifying the promoter. Here, we determined what we think is the major TSS (mTSS) and also showed the existence of alternative TSSs (aTSS) further upstream. This is in agreement with studies reporting that many eukaryotic genes do not use a single TSS but that transcription can start from different sites . For example, Suzuki and colleagues analysed 276 human genes and found that the distribution of TSSs is spread over a region of 61.7 bp on average for genes with and without a TATA-box . However, for genes with a TATA-box the TSSs were more tightly clustered. A more recent study performed a genome-wide analysis of mammalian promoters which confirmed the presence of aTSSs for the majority of genes and defined classes of promoters according to the presence and usage of TATA-box, CCAAT-box, GC-box and CpG islands in the context of TSSs . Four categories of promoters and TSSs were classified, (i) a single dominant peak TSS class (SP) with a single dominant TSS, (ii) a general broad TSS distribution (BR), (iii) a broad TSS distribution with a dominant TSS (PB) and (iv) bi- or multimodal TSSs (MU). Based on our data, the PEG10 promoter may belong to the PB class. Two more criteria speak for the here defined PEG10 mTSS. First, it is preceded by a TATA-box (100% consensus sequence) positioned in the ideal distance of 24–30 nucleotides . Second, the sequence around the TSS, G>T(−1)>A(+1)>C>A>C>G, conforms closely to the Py>Py>A(+1)>N>T/A>Py>Py sequence of initiator elements (Inr)  in which the initiation site shows the preferred pyrimidine-purine di-nucleotide . The spacing of the TATA-box and Inr would allow these two elements to act synergistically, which is less efficient or neglectable for a distance of more than 30 bp . It is interesting to note, that the sequence of the TATA-box and Inr element and the spacing of the two is highly conserved among various species, i.e. human, mouse, bovine and dog.