Research Article: Characterization of bovine (Bos taurus) imprinted genes from genomic to amino acid attributes by data mining approaches

Date Published: June 6, 2019

Publisher: Public Library of Science

Author(s): Keyvan Karami, Saeed Zerehdaran, Ali Javadmanesh, Mohammad Mahdi Shariati, Hossein Fallahi, Sebastian D. Fugmann.


Genomic imprinting results in monoallelic expression of genes in mammals and flowering plants. Understanding the function of imprinted genes improves our knowledge of the regulatory processes in the genome. In this study, we have employed classification and clustering algorithms with attribute weighting to specify the unique attributes of both imprinted (monoallelic) and biallelic expressed genes. We have obtained characteristics of 22 known monoallelically expressed (imprinted) and 8 biallelic expressed genes that have been experimentally validated alongside 208 randomly selected genes in bovine (Bos taurus). Attribute weighting methods and various supervised and unsupervised algorithms in machine learning were applied. Unique characteristics were discovered and used to distinguish mono and biallelic expressed genes from each other in bovine. To obtain the accuracy of classification, 10-fold cross-validation with concerning each combination of attribute weighting (feature selection) and machine learning algorithms, was used. Our approach was able to accurately predict mono and biallelic genes using the genomics and proteomics attributes.

Partial Text

Most diploid organisms, including mammalians, receive two copies of each gene from their parents and express both alleles equally in their cells. For normal development, each individual needs to receive both the maternal and paternal genomes. For many genes in mammalian species, both the maternal and paternal alleles are equally expressed. However, the expression of some genes is determined by imprinting, an epigenetic event in which only one of the alleles inherited from one of the parents get silenced and inactivated[1]. Consequently, in a limited group of genes which are imprinted, one of the parental alleles is expressed preferentially [2]. The epigenetic mechanism in the form of imprinting leads to monoallelic expression of some genes depending on parent-of-origin of the allele [3]. Therefore, if the paternal allele of the gene is imprinted, the other allele from the mother would be expressed and vice versa. This results in unequal expression of two alleles of a gene, which is in contrast to Mendelian genetics. The imprinting mechanism in mammalian species is mainly conserved. Genomic imprinting leads to allele-specific gene expression[4, 5]. It has been shown that many human diseases including Prader–Willi syndrome (PWS)[6], Beckwith–Wiedemann syndrome (BWS) [7] and some types of cancer [8, 9] are strongly associated with defective expression in imprinted genes. Large offspring syndrome (LOS) is an example of abnormal imprinting in bovine and ovine that causes abnormally high rates of growth which is phenotypically and epigenetically similar to BWS in human [10]. The conservation pattern between different organisms has greatly facilitated the study of imprinting mechanisms in some human genetic disorders [11]. On the other hand the importance of imprinted genes is increasing, because there are some evidences that imprinting defects are associated with complex disorders like diabetes, obesity, developmental abnormalities and behavioral disorders.

SVM dataset with the average accuracy of 91.71% in induction models and 94.20% in Neural Network and Bayesian models had the highest accuracy among evaluated datasets. Therefore, this pattern could be better than others in distinguishing imprinted and biallelic expressed genes. This dataset comprised as the length of CpG in 100 kb down, average length of CpG in 100 kb down, SINE 100 kb UP, CpGi 3’-UTR, Ala, average length of CpG in gene region, SINE 10 kb up, CpGn 3’-UTR, average length of CpG 10 kb down and Pro/CCT.

According to our results, attributes related to GC content and CpG in upstream and downstream regions of genes, SINE in 10 and 100kbUp and frequency of some amino acids including Ala, Arg, Pro were the most important attributes for distinguish imprinted and biallelic expressed genes. The sequence characteristics presented in the current study predict the imprinting status of genes in bovine with high accuracy. This method could be applied to expand the number of imprinted genes in genome of other species. With more imprinted genes in hand, it would be possible to deepen our understandings regarding the genetic and epigenetic regulatory mechanism involved in the monoallelic expression of imprinted genes. Besides, assessment of the method in other genomes might be useful to find an evolutionary relationship among species and would be beneficial to find monoallelically expressed genes elsewhere. Also, the next step would be the application of these patterns in the identification of novel sets of imprinted genes.