Date Published: January 30, 2017
Publisher: Public Library of Science
Author(s): Benjamin M. Anderson, Kevin R. Thiele, Siegfried L. Krauss, Matthew D. Barrett, Genlou Sun.
Next-generation sequencing is becoming increasingly accessible to researchers asking biosystematic questions, but current best practice in both choosing a specific approach and effectively analysing the resulting data set is still being explored. We present a case study for the use of genotyping-by-sequencing (GBS) to resolve relationships in a species complex of Australian arid and semi-arid grasses (Triodia R.Br.), highlighting our solutions to methodological challenges in the use of GBS data. We merged overlapping paired-end reads then optimised locus assembly in the program PyRAD to generate GBS data sets for phylogenetic and distance-based analyses. In addition to traditional concatenation analyses in RAxML, we also demonstrate the novel use of summary species tree analyses (taking gene trees as input) with GBS loci. We found that while species tree analyses were relatively robust to variation in PyRAD assembly parameters, our RAxML analyses resulted in well-supported but conflicting topologies under different assembly settings. Despite this conflict, multiple clades in the complex were consistently supported as distinct across analyses. Our GBS data assembly and analyses improve the resolution of taxa and phylogenetic relationships in the Triodia basedowii complex compared to our previous study based on Sanger sequencing of nuclear (ITS/ETS) and chloroplast (rps16-trnK spacer) markers. The genomic results also partly support previous evidence for hybridization between species in the complex. Our methodological insights for analysing GBS data will assist researchers using similar data to resolve phylogenetic relationships within species complexes.
Next-generation sequencing data sets are becoming increasingly accessible for addressing phylogenetic and biosystematic questions. Approaches to generating these data sets (reviewed in ) vary in their cost in terms of time and money, their requirement for existing genomic knowledge, their applicable evolutionary time (and hence taxonomic) scale, and the quality and sample coverage of the data. It is not always clear which approach will be most efficient and effective for addressing a given research question. In addition, analytical tools and approaches for resolving evolutionary relationships with genomic data are areas of ongoing research and testing [2–7], empirical application [8–12], and debate about best practice for phylogenetic inference [13,14].
Our approach to assembling and analysing GBS data highlights common bioinformatic challenges and reveals multiple ways to extract biologically meaningful signal from these data sets, especially in non-model systems. Our results also have implications for the biosystematics of the Triodia basedowii complex, with support for the recognition of multiple new species and greater resolution of relationships between species than was obtained in a previous study  based on Sanger sequencing of a few loci.
By addressing GBS bioinformatic challenges, such as overlapping reads following imperfect size selection and paired-end read locus duplication, a biologically meaningful phylogenetic signal from across the genome can be extracted for study systems lacking prior genomic knowledge. The continuing improvement in next-generation sequencing read length holds promise for using analytical methods that rely on extracting phylogenetic signal from each locus, such as new species tree approaches. Here we have demonstrated their use with our assembled GBS loci and shown that they are more robust to variation in assembly parameters than commonly used concatenation approaches. Our GBS data and analyses have improved the resolution of relationships in the T. basedowii complex and provided new insights into processes influencing evolution of the group (e.g. polyploidy and partial introgression). These results will support an upcoming taxonomic revision of the complex, with the recognition of new species including one of conservation significance, and complement the findings of Anderson et al.  with regard to the high diversity of the complex in the Pilbara region of Western Australia. We encourage other researchers working in similarly difficult taxonomic systems to use the relatively affordable and accessible GBS approach outlined here for generating and analysing genomic data.