Date Published: July 3, 2017
Publisher: Public Library of Science
Author(s): Joseph MEX Lucas, Hugues Roest Crollius, Sudhindra R. Gadagkar.
A conserved segment, i.e. a segment of chromosome unbroken during evolution, is an important operational concept in comparative genomics. Until now, algorithms that are designed to identify conserved segments often return synteny blocks that overlap, synteny blocks that include micro-rearrangements or synteny blocks erroneously short. Here we present definitions of conserved segments and synteny blocks independent of any heuristic method and we describe four new post-processing strategies to refine synteny blocks into accurate conserved segments. The first strategy identifies micro-rearrangements, the second strategy identifies mono-genic conserved segments, the third returns non-overlapping segments and the fourth repairs incorrect ruptures of synteny. All these refinements are implemented in a new version of PhylDiag that has been benchmarked against i-ADHoRe 3.0 and Cyntenator, based on a realistic simulated evolution and true simulated conserved segments.
Genomes are evolving molecules that are continuously mutating and rearranging. Despite these alterations, some segments of chromosomes remain exempt from disruption and still reflect the ancestral genome organisation; in 1984 they were first called “conserved segments” by Nadeau and Taylor . Identifying those conserved segments is a prerequisite in rearrangement studies. However studies usually only focus on macro-rearrangements, to abstract themselves from spurious micro-rearrangements pervasive in draft genome assemblies, and thus rather use synteny blocks instead of conserved segments. In 2003, Pevzner and Tesler  introduced the term “synteny block” to refer to “segments that can be converted into conserved segments by micro-rearrangements. […] they usually consist of short regions of similarity that may be interrupted by dissimilar regions and gaps.” Studying synteny blocks and more generally identifying the conservation of synteny is the first step toward the identification of conserved segments from extant genomes. However considering that synteny blocks are a proxy for conserved segments is most of the time a mistake since numerous real micro-rearrangements, unrelated to genome assembly errors, are scattered in extant genomes [3,4]. Furthermore, because the identification of synteny blocks relies by definition on the conservation of synteny relationships between at least two markers, synteny blocks systematically miss conserved segments containing only one marker, and thus they cannot account for breakpoints corresponding to single-marker inversions. In this article we provide strategies to fine-tune the retrieval of conserved segments through the processing of synteny blocks.
In the field of comparative genomics performed at the scale of whole genomes, small rearrangements are generally dismissed from consideration because they are difficult or impossible to distinguish from assembly and annotation errors . Yet the corresponding breakpoints of true small inversions may completely reshape a set of conserved segments that was detected without considering them. As a consequence, what may be thought to be a long conserved segment may prove to be several conserved segments of modest sizes when micro-inversions are considered. With improvements in the accuracy of genome assemblies and annotations, we believe that it might be time to start studying small rearrangements simultaneously to macro-rearrangements. In this context our method is the first to identify conserved segments up to mono-genic segments. It finds conserved segments with a level of accuracy that takes the full advantage of knowing the transcription orientations of genes (Fig 5). Furthermore it solves complex cases of synteny involving remnants of tandem duplications (Figs 3 and 4) as well as incorrect ruptures of syntenies (Fig 6). Yet we show that the quality of conserved segment detection cannot reach 100% since some scenarios of evolution without breakpoint yield similar extant genomes to scenarios of evolution with breakpoints (Figs 9 and 10).