Research Article: Genomic variants identified from whole-genome resequencing of indicine cattle breeds from Pakistan

Date Published: April 11, 2019

Publisher: Public Library of Science

Author(s): Naveed Iqbal, Xin Liu, Ting Yang, Ziheng Huang, Quratulain Hanif, Muhammad Asif, Qaiser Mahmood Khan, Shahid Mansoor, Y-h. Taguchi.


The primary goal of cattle genomics is the identification of genome-wide polymorphism associated with economically important traits. The bovine genome sequencing project was completed in 2009. Since then, using massively parallel sequencing technologies, a large number of Bos taurus cattle breeds have been resequenced and scanned for genome-wide polymorphisms. As a result, a substantial number of single nucleotide polymorphisms (SNPs) have been discovered across European Bos taurus genomes, whereas extremely less number of SNPs are cataloged for Bos indicus breeds. In this study, we performed whole-genome resequencing, reference-based mapping, functional annotation and gene enrichment analysis of 20 sires representing eleven important Bos indicus (indicine) breeds of Pakistan. The breeds sequenced here include: Sahiwal, Red Sindhi, Tharparkar and Cholistani (tropically adapted dairy and dual purpose breeds), Achai, Bhagnari, Dajal and Lohani (high altitude adapted dual and drought purpose breeds); Dhanni, Hisar Haryana and Gabrali (dairy and light drought purpose breeds). A total of 17.4 billion QC passed reads were produced using BGISEQ-500 next generation sequencing platform to generate 9 to 27-fold genome coverage (average ~16×) for each of the 20 sequenced sires. A total of 67,303,469 SNPs were identified, of which 3,850,365 were found novel and 1,083,842 insertions-deletions (InDels) were detected across the whole sequenced genomes (491,247 novel). Comparative analysis using coding region SNPs revealed a close relationship between the best milking indicine breeds; Red Sindhi and Sahiwal. On the other hand, Bhagnari and Tharparkar being popular for their adaptation to dry and extremely hot climates were found to share the highest number of SNPs. Functional annotation identified a total of 3,194 high-impact (disruptive) SNPs and 745 disruptive InDels (in 275 genes) that may possibly affect economically important dairy and beef traits. Functional enrichment analysis was performed and revealed that high or moderate impact variants in wingless-related integration site (Wnt) and vascular smooth muscle contraction (VSMC) signaling pathways were significantly over-represented in tropically adapted heat tolerant Pakistani-indicine breeds. On the other hand, vascular endothelial growth factor (VEGF) and hypoxia-inducible factor 1 (HIF-1) signaling pathways were found over-represented in highland adapted Pakistani-indicine breeds. Similarly, the ECM-receptor interaction and Jak-STAT signaling pathway were significantly enriched in dairy and beef purpose Pakistani-indicine cattle breeds. The Toll-like receptor signaling pathway was significantly enriched in most of the Pakistani-indicine cattle. Therefore, this study provides baseline data for further research to investigate the molecular mechanisms of major traits and to develop potential genomic markers associated with economically important breeding traits, particularly in indicine cattle.

Partial Text

Livestock contributes 40% of the worldwide estimation of agricultural yield [1]. It provides employment to nearly 1.3 billion people worldwide and directly helps the livings of 0.6 billion farmers in the developing countries [2]. Domestic cattle plays important role in agricultural economy of developing countries, its contribution goes beyond the direct production of milk and meat to skins, fiber, fertilizer and fuel production [3, 4].

This study presented extensive genome analysis of eleven indigenous Pakistani cattle breeds following whole-genome resequencing using BGISEQ-500 sequencing platform. The selected low to medium coverage resequencing method lead to detection of 67,303,469 SNPs and 1,083,842 InDels in all studied cattle samples. The novel SNPs deposition to the dbSNP has considerably increased the number of indicine variants, which could play an important role towards the development of biased free SNP-array for genomic selection and genome-wide-association studies in indicine cattle. Coding and regulatory SNPs based genome comparison of five major and geographically diverse indicine breeds from Pakistan indicated that, the second best milking breed Red Sindhi is closely related to the best milking breed Sahiwal by sharing highest number of SNPs (173,344). On the other hand, the indicine breeds adapted to dry and extremely hot climate, Bhagnari and Tharparkar (white to gray coat color) shared highest number of SNPs (254,633). For all samples, variant function annotation revealed only 3,194 and 745 high impact (disruptive) SNPs and InDels, respectively. A total of 531 disruptive genes (256 genes harboring LoF SNPs and 275 harboring frame-shift) were used separately for GO enrichment analysis. The GO enrichment analysis revealed that most of the altered genes are significantly enriched in economically important biological processes, such as immune responses, heat tolerance, signaling pathways, cellular development and sensory perceptions. Therefore, this study provides baseline data for further research to reveal molecular mechanisms and identify potential genomic markers associated with economically important cattle traits for genomic selection (breeding), particularly in indicine breeds.