Date Published: January 13, 2010
Publisher: Public Library of Science
Author(s): Sean Myles, Jer-Ming Chia, Bonnie Hurwitz, Charles Simon, Gan Yuan Zhong, Edward Buckler, Doreen Ware, Pär K. Ingvarsson. http://doi.org/10.1371/journal.pone.0008219
Abstract: Next-generation sequencing technologies promise to dramatically accelerate the use of genetic information for crop improvement by facilitating the genetic mapping of agriculturally important phenotypes. The first step in optimizing the design of genetic mapping studies involves large-scale polymorphism discovery and a subsequent genome-wide assessment of the population structure and pattern of linkage disequilibrium (LD) in the species of interest. In the present study, we provide such an assessment for the grapevine (genus Vitis), the world’s most economically important fruit crop. Reduced representation libraries (RRLs) from 17 grape DNA samples (10 cultivated V. vinifera and 7 wild Vitis species) were sequenced with sequencing-by-synthesis technology. We developed heuristic approaches for SNP calling, identified hundreds of thousands of SNPs and validated a subset of these SNPs on a 9K genotyping array. We demonstrate that the 9K SNP array provides sufficient resolution to distinguish among V. vinifera cultivars, between V. vinifera and wild Vitis species, and even among diverse wild Vitis species. We show that there is substantial sharing of polymorphism between V. vinifera and wild Vitis species and find that genetic relationships among V. vinifera cultivars agree well with their proposed geographic origins using principal components analysis (PCA). Levels of LD in the domesticated grapevine are low even at short ranges, but LD persists above background levels to 3 kb. While genotyping arrays are useful for assessing population structure and the decay of LD across large numbers of samples, we suggest that whole-genome sequencing will become the genotyping method of choice for genome-wide genetic mapping studies in high-diversity plant species. This study demonstrates that we can move quickly towards genome-wide studies of crop species using next-generation sequencing. Our study sets the stage for future work in other high diversity crop species, and provides a significant enhancement to current genetic resources available to the grapevine genetic community.
Partial Text: The aim of genetic mapping studies is to identify loci that underlie phenotypic variation. Genetic mapping studies are critical for improving crops through marker-assisted breeding and for our understanding of the relationship between genotype and phenotype . Genome wide association (GWA) mapping  and genomic selection (GS)  are increasingly being adopted for crop improvement and they often require large numbers of genetic markers. One of the main challenges in agricultural genetics is to access and use the tremendous genetic variation present in germplasm collections and in the wild, as crop species are far more diverse than the vertebrate systems used in biomedical research. To do this, approaches for applying next generation sequencing technology to non-model systems need to be developed .
We generated reduced representation libraries (RRLs) from 17 grapevine DNA samples (10 cultivated V. vinifera varieties, 6 wild Vitis species and the reference genome (inbred Pinot Noir) – see Table S1 for details on samples) by digesting each sample with the restriction enzyme HpaII, which has proved useful in the generation of RRLs by others , . The generation of RRLs permits high-coverage sequencing of a small, similar fraction of the genome across samples. Each RRL was sequenced on a single lane of Illumina’s Genome Analyzer to produce 57.3 million 36-bp reads (2.6 Gb of DNA sequence). We trimmed off the last 4 bases of each read and aligned the 32 bp reads to the reference genome using ELAND (Illumina Inc). In total, 68% of the reads successfully mapped to the reference genome: 57% mapped uniquely, 11% mapped to multiple locations (repetitive) and 32% provided no alignment (no match). Figure 1 provides a summary of the alignment results and the proportion of reads carrying the HpaII sequence tag across the 17 samples.
WGA and GS studies have generally concentrated on a small number of organisms with established genotyping arrays. With the decreasing costs of DNA sequencing and genotyping, we anticipate that there will be interest in moving rapidly towards GWA and GS studies in organisms for which relatively little genetic data currently exists. Particularly in plants, the Germplasm Repositories of the United States Department of Agriculture currently house over 500,000 different accessions, presenting an enormous amount of genetic diversity to be catalogued and an incredibly large inventory of genetic variation waiting to be discovered and used. In the present study, we provide a framework for rapidly and cost-effectively moving from very few genetic resources, to genome-wide characterization of a species of great economic and cultural interest, the grapevine.