Research Article: Partially local three-way alignments and the sequence signatures of mitochondrial genome rearrangements

Date Published: August 23, 2017

Publisher: BioMed Central

Author(s): Marwa Al Arab, Matthias Bernt, Christian Höner zu Siederdissen, Kifah Tout, Peter F. Stadler.

http://doi.org/10.1186/s13015-017-0113-0

Abstract

Genomic DNA frequently undergoes rearrangement of the gene order that can be localized by comparing the two DNA sequences. In mitochondrial genomes different mechanisms are likely at work, at least some of which involve the duplication of sequence around the location of the apparent breakpoints. We hypothesize that these different mechanisms of genome rearrangement leave distinctive sequence footprints. In order to study such effects it is important to locate the breakpoint positions with precision.

We define a partially local sequence alignment problem that assumes that following a rearrangement of a sequence F, two fragments L, and R are produced that may exactly fit together to match F, leave a gap of deleted DNA between L and R, or overlap with each other. We show that this alignment problem can be solved by dynamic programming in cubic space and time. We apply the new method to evaluate rearrangements of animal mitogenomes and find that a surprisingly large fraction of these events involved local sequence duplications.

The partially local sequence alignment method is an effective way to investigate the mechanism of genomic rearrangement events. While applied here only to mitogenomes there is no reason why the method could not be used to also consider rearrangements in nuclear genomes.

The online version of this article (doi:10.1186/s13015-017-0113-0) contains supplementary material, which is available to authorized users.

Partial Text

The small genomes of animal mitochondria, usually harbouring only 13 protein-coding genes as well as their own ribosomal and transfer RNAs, are subject to frequent rearrangements of the gene order. There does not seem to exist a unique molecular mechanism, however. Inversions [1] can be explained by inter-mitochondrial recombination [2, 3]. Similarly, transposition [4] and inverse transposition [5] may also be the result of nonhomologous recombination events [6, 7]. In a tandem duplication random loss (TDRL) event [8, 9], on the other hand, part of the mitogenome, which contains one or more genes, is duplicated in tandem; subsequently, one of the redundant copies of the genes is lost at random. Transpositions can also be explained by a TDRL mechanism, and there is at least evidence that rearrangements involving the inversion of genes can be explained by a duplication-based mechanism, where the duplicate is inverted [10]. It remains an open question how variable the rates and the relative importance of different rearrangement mechanisms are over longer evolutionary time-scales and in different clades. While TDRLs leave a clearly identifiable trace in the mitogenomic sequence, namely the usually rapidly decaying pseudogenized copies of redundant genes [10], little is known about the impact of other rearrangement mechanisms. It has been observed, however, that lineages with frequent rearrangements also show elevated levels of nucleotide sequence variation [11].Fig. 1Elementary rearrangement events discussed for mitogenomes. From left to right: inversion, transposition, inverse transposition, tandem duplication random loss. Pseudogenisation leading to eventual gene loss is indicated by symbols without borders(Adapted from [12] ©Elsevier)

The alignment of the reference sequence F and its two offspring L and R is global at the “outer end” (here terminal deletions are scored), but local toward the breakpoint region (here terminal deletions in R and L, resp., remain unscored. Although the problem is symmetric, we follow the usual algorithmic design of dynamic programming algorithms for sequence alignments and consider partial solutions that are restrictions to prefixes of F, L, and R. In the following we denote by documentclass[12pt]{minimal}
usepackage{amsmath}
usepackage{wasysym}
usepackage{amsfonts}
usepackage{amssymb}
usepackage{amsbsy}
usepackage{mathrsfs}
usepackage{upgreek}
setlength{oddsidemargin}{-69pt}
begin{document}$$m:=|F|$$end{document}m:=|F|, documentclass[12pt]{minimal}
usepackage{amsmath}
usepackage{wasysym}
usepackage{amsfonts}
usepackage{amssymb}
usepackage{amsbsy}
usepackage{mathrsfs}
usepackage{upgreek}
setlength{oddsidemargin}{-69pt}
begin{document}$$n:=|L|$$end{document}n:=|L|, and documentclass[12pt]{minimal}
usepackage{amsmath}
usepackage{wasysym}
usepackage{amsfonts}
usepackage{amssymb}
usepackage{amsbsy}
usepackage{mathrsfs}
usepackage{upgreek}
setlength{oddsidemargin}{-69pt}
begin{document}$$p:=|R|$$end{document}p:=|R| the respective length of the input sequences and by documentclass[12pt]{minimal}
usepackage{amsmath}
usepackage{wasysym}
usepackage{amsfonts}
usepackage{amssymb}
usepackage{amsbsy}
usepackage{mathrsfs}
usepackage{upgreek}
setlength{oddsidemargin}{-69pt}
begin{document}$$S_{i,j,k}$$end{document}Si,j,k the maximal score of an alignment of the prefixes F[1..i], L[1..j], and R[1..k]. As usual, an index 0 refers to the empty prefix. We restrict our attention to additive scores defined on the input alphabet augmented by the gap character (documentclass[12pt]{minimal}
usepackage{amsmath}
usepackage{wasysym}
usepackage{amsfonts}
usepackage{amssymb}
usepackage{amsbsy}
usepackage{mathrsfs}
usepackage{upgreek}
setlength{oddsidemargin}{-69pt}
begin{document}$$texttt {`-‘}$$end{document}`-′). We write documentclass[12pt]{minimal}
usepackage{amsmath}
usepackage{wasysym}
usepackage{amsfonts}
usepackage{amssymb}
usepackage{amsbsy}
usepackage{mathrsfs}
usepackage{upgreek}
setlength{oddsidemargin}{-69pt}
begin{document}$$gamma (a,b,c)$$end{document}γ(a,b,c) for the score of the three-way parts and documentclass[12pt]{minimal}
usepackage{amsmath}
usepackage{wasysym}
usepackage{amsfonts}
usepackage{amssymb}
usepackage{amsbsy}
usepackage{mathrsfs}
usepackage{upgreek}
setlength{oddsidemargin}{-69pt}
begin{document}$$sigma (a,b)$$end{document}σ(a,b) for the pairwise part, i.e., regions in which a suffix of L or a prefix of R remains unaligned. More details will be given at the end of this section.

Figure 4 summarizes the distribution of overlap sizes. Several patterns are clearly visible. First, there is a substantial fraction of alignments with long gaps. The main cause of these long gaps is a very low nucleic acid sequence similarity with an average of only 39% between the gene portions (60 nt from start or end). This explains 10 of the 15 cases. Some of the remaining cases are explained by annotation errors such as the misannotation of trnY in Luvarus imperialis. In four alignments the long gaps are caused by the long intergenic region in reference sequence F.

We have introduced a specialized three-way alignment model to study the breakpoint regions of mitochondrial genome rearrangements in detail. We observed several unexpected features. In particular, a substantial fraction of rearrangements involves duplication of genomic DNA, many of which have not been recognized as TDRL-like events. This in particular pertains to many events that have been classified as transpositions. On the other hand, some apparent TDRL events do not produce overlaps. While it is possible that in the case of ancient events the genomic sequences have diverged beyond the point where duplicated DNA is still recognizable, it is also possible that some of the ostensible TDRLs in fact correspond to possibly multiple rearrangements of other types. Clearly, further investigations into the individual cases will be necessary to resolve this issue completely. The present study at the very least adds to the evidence that multiple rearrangement mechanisms are at work and indicates that their classification is by no means a trivial task.

 

Source:

http://doi.org/10.1186/s13015-017-0113-0

 

Leave a Reply

Your email address will not be published.