Research Article: Genome Skimming: A Rapid Approach to Gaining Diverse Biological Insights into Multicellular Pathogens

Date Published: August 4, 2016

Publisher: Public Library of Science

Author(s): Dee R. Denver, Amanda M. V. Brown, Dana K. Howe, Amy B. Peetz, Inga A. Zasada, June L. Round.

http://doi.org/10.1371/journal.ppat.1005713

Abstract

Partial Text

Genomic data acquisition is now trivial for biologists. Yet, moving from millions of sequence reads to an assembled and annotated genome continues to pose a daunting challenge. The first animal genome sequenced arose from the free-living model nematode Caenorhabditis elegans [1]. This venture provided an unprecedented foundation for new insights into genome function and ‘omics tool development. However, the C. elegans endeavor has been tough to repeat, even with the advent of new high-throughput DNA sequencing technologies. For example, the first plant-parasitic nematode (PPN) genomes were published ten years after the C. elegans genome [2,3], and only five publication-quality PPN genomes are presently available [4–6].

We applied our genome skimming strategy (Fig 1; see S1 Text) to six PPN species: Anguina agrostis, Globodera ellingtonae, Pratylenchus neglectus, P. penetrans, P. thornei, and Xiphinema americanum. Five of these species are in the “top ten” list of nematode plant pathogens [10]. Our approach begins like most genome projects by creating a single unrefined assembly for each PPN that provides a reference set of sequences for subsequent study. The lengthy downstream bioinformatics steps of typical genome projects, however, were simply not done. After completing single-pass assemblies, we examined the basic properties of the assembled contigs (Table 1). Assemblies yielded between ~10,000 and ~50,000 contigs per PPN, with average n-fold DNA sequence coverage values ranging from 7.7X to 30.4X. With an average coarse genome size estimate of 107.1 Mb and average GC content of 40.5%, these 6 PPN genome assembly patterns are consistent with known nematode genome size ranges [11,12]. We note that our smallest estimate (38.5 Mb) came from X. americanum, whose relative in the family Longidoridae, Longidorus kuiperi, also has a small genome size estimate of 56.5 Mb [13]. The N50 statistic, a common statistical measure for average length of a set of sequences (see S1 Text for more detail) was 8,863 bp on average for the six PPN species analyzed. Since nematode genes average ~2–3 kb in length [1,11,12], the contigs resulting from our single-pass assembly are sufficiently long to be useful database resources for BLAST [14].

Early genome sequencing initiatives focused on model organisms such as C. elegans, in which sequenced DNA came from highly inbred lab populations. Modern pathogen genomics, however, often requires analysis of natural populations in which numerous factors can lead to deviations from the genomic uniformity of an inbred lab culture. For example, pathogens may display population-level genetic variation, within-individual heterozygosity, and other deviations (e.g., polyploidy or interspecies hybridization). These pose potential challenges but also opportunities for discovery. Interspecies hybridization and associated genome admixture is of increasing relevance to natural parasite populations [15]. Meloidogyne incognita, the world’s most devastating PPN species, evolved through between-species hybridization, as evidenced by recent phylogenomic analyses and the complex ploidy state of its nuclear genome [2,16]. The extent of hybridization among PPN species, however, remains unclear.

Discovery and functional characterization of effector genes, whose products directly engage in attacks on host defenses, is a central aim of any pathogen genome project. Protein sequences for 10 effectors, well characterized in other PPN species (S2 Table), were used as TBLASTN queries to screen our PPN contig databases for homologous matches. Our search revealed 42 matches (out of 60 possible) distributed across the PPN genomes (Table 1). As expected, more hits were observed in the 5 tylenchid PPN species analyzed (ranging from 6 to 8) compared to the very distantly related X. americanum, in which only 3 hits were observed. These 3 genes (annexin, β-1,4-endoglucanase, peroxiredoxin) were found in all 5 of the other species studied; a previous study revealed evidence for an expressed endoglucanase effector in X. index [18], a congener of X. americanum. The 3 X. americanum hit e-values (averaging 7.1 E-30) and hit lengths (averaging 459 bp) were larger and shorter, respectively, compared to averages for these 3 genes in the other 5 species (1.0 E-42, 632 bp). The addition of a simple single BLAST step to our genome skimming strategy quickly revealed the presence of numerous putative effector genes in the PPN species, though follow-up experimentation and analysis remains necessary to evaluate whether or not bona fide effectors are encoded by the DNA sequences identified.

Bacterial endosymbionts, such as Wolbachia spp., are well known and widespread components of diverse arthropods. Genome sequencing efforts in filarial nematode species revealed the presence of Wolbachia, which functions as an obligate mutualist in these pathogens of animals and humans [19,20].

Genome skimming provides a rapid and affordable avenue for biological inquiry and hypothesis generation that avoids the time delays that accompany most genomic endeavors. A single-pass assembly followed by BLAST-based and other simple analyses revealed evidence for potential genomic hybridization, effector genes, and endosymbionts in the PPN genomes studied. Although genome skimming provides an effective approach to hypothesis generation, follow-up work remains necessary for hypothesis evaluation. Genome skimming alone will not suffice for biological questions requiring gene prediction and annotation (e.g., patterns of gene family expansion, instances of horizontal gene transfer). Nonetheless, our genome skimming pilot experiment provided quick and exciting biological insights and community genomic resources, essentially doubling the number of PPN species for which published genome sequence resources are available. How might our understanding of nematode pathogens change if genome skimming were applied to 600 PPN species instead of 6?

 

Source:

http://doi.org/10.1371/journal.ppat.1005713

 

0 0 vote
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments