Research Article: A Catalog of Neutral and Deleterious Polymorphism in Yeast

Date Published: August 29, 2008

Publisher: Public Library of Science

Author(s): Scott W. Doniger, Hyun Seok Kim, Devjanee Swain, Daniella Corcuera, Morgan Williams, Shiaw-Pyng Yang, Justin C. Fay, Jonathan K. Pritchard

Abstract: The abundance and identity of functional variation segregating in natural populations is paramount to dissecting the molecular basis of quantitative traits as well as human genetic diseases. Genome sequencing of multiple organisms of the same species provides an efficient means of cataloging rearrangements, insertion, or deletion polymorphisms (InDels) and single-nucleotide polymorphisms (SNPs). While inbreeding depression and heterosis imply that a substantial amount of polymorphism is deleterious, distinguishing deleterious from neutral polymorphism remains a significant challenge. To identify deleterious and neutral DNA sequence variation within Saccharomyces cerevisiae, we sequenced the genome of a vineyard and oak tree strain and compared them to a reference genome. Among these three strains, 6% of the genome is variable, mostly attributable to variation in genome content that results from large InDels. Out of the 88,000 polymorphisms identified, 93% are SNPs and a small but significant fraction can be attributed to recent interspecific introgression and ectopic gene conversion. In comparison to the reference genome, there is substantial evidence for functional variation in gene content and structure that results from large InDels, frame-shifts, and polymorphic start and stop codons. Comparison of polymorphism to divergence reveals scant evidence for positive selection but an abundance of evidence for deleterious SNPs. We estimate that 12% of coding and 7% of noncoding SNPs are deleterious. Based on divergence among 11 yeast species, we identified 1,666 nonsynonymous SNPs that disrupt conserved amino acids and 1,863 noncoding SNPs that disrupt conserved noncoding motifs. The deleterious coding SNPs include those known to affect quantitative traits, and a subset of the deleterious noncoding SNPs occurs in the promoters of genes that show allele-specific expression, implying that some cis-regulatory SNPs are deleterious. Our results show that the genome sequences of both closely and distantly related species provide a means of identifying deleterious polymorphisms that disrupt functionally conserved coding and noncoding sequences.

Partial Text: DNA sequence polymorphism makes a major contribution to phenotypic variation and provides a mean by which natural selection can lead to microevolutionary change and divergence between species. Since the first methods were developed to systematically survey DNA polymorphism within species and divergence between species [1], there has been a long-standing effort to identify and characterize this variation. Currently, genome sequences have been generated for a wide range of species and comparative genomic methods have identified coding and noncoding sequences that are functionally conserved across distantly related species [2]–[7], and characterized the phylogenetic distribution of these sequences, which is not always constant [8]–[13]. Recently, more closely related genomes have been sequenced in order to identify and characterize DNA polymorphism and divergence within functional and nonfunctional sequences [14]–[18]. Although the focus on differences between closely related species poses new challenges to comparative genomics methods, such as accounting for alignment and sequencing error, the main challenge lies in distinguishing polymorphisms with phenotypic and fitness consequences from those that are inconsequential.

Genome sequencing of multiple organisms from the same species makes it possible to both catalog DNA polymorphism and identify variation with fitness and/or functional consequences. Using genome sequences of two strains of S. cerevisiae, we found variation in genome content, structure and sequence. Overall, there is substantial evidence for functional variation, based on disruption of sequences annotated in the reference genome, and deleterious variation, based on disruption of sequences conserved across other yeast species.