Date Published: October 27, 2011
Publisher: BioMed Central
Author(s): Valery O Polyanovsky, Mikhail A Roytberg, Vladimir G Tumanyan.
Algorithms of sequence alignment are the key instruments for computer-assisted studies of biopolymers. Obviously, it is important to take into account the “quality” of the obtained alignments, i.e. how closely the algorithms manage to restore the “gold standard” alignment (GS-alignment), which superimposes positions originating from the same position in the common ancestor of the compared sequences. As an approximation of the GS-alignment, a 3D-alignment is commonly used not quite reasonably. Among the currently used algorithms of a pair-wise alignment, the best quality is achieved by using the algorithm of optimal alignment based on affine penalties for deletions (the Smith-Waterman algorithm). Nevertheless, the expedience of using local or global versions of the algorithm has not been studied.
Using model series of amino acid sequence pairs, we studied the relative “quality” of results produced by local and global alignments versus (1) the relative length of similar parts of the sequences (their “cores”) and their nonhomologous parts, and (2) relative positions of the core regions in the compared sequences. We obtained numerical values of the average quality (measured as accuracy and confidence) of the global alignment method and the local alignment method for evolutionary distances between homologous sequence parts from 30 to 240 PAM and for the core length making from 10% to 70% of the total length of the sequences for all possible positions of homologous sequence parts relative to the centers of the sequences.
We revealed criteria allowing to specify conditions of preferred applicability for the local and the global alignment algorithms depending on positions and relative lengths of the cores and nonhomologous parts of the sequences to be aligned. It was demonstrated that when the core part of one sequence was positioned above the core of the other sequence, the global algorithm was more stable at longer evolutionary distances and larger nonhomologous parts than the local algorithm. On the contrary, when the cores were positioned asymmetrically, the local algorithm was more stable at longer evolutionary distances and larger nonhomologous parts than the global algorithm. This opens a possibility for creation of a combined method allowing generation of more accurate alignments.
Pair-wise alignment of amino acid sequences is the main method of comparative protein analysis. Among the most popular algorithms based on comparison of protein primary structures the Needleman-Wunch algorithm , the Smith-Waterman algorithm , BLAST , and FASTA  should be noted. On the basis of paper  the algorithm  was created for comparing sequences with intermittent similarities. The improved version  makes use of multiple parameter sets in computation of an optimal alignment of the two sequences. A number of algorithms (Walquist et al. , Litvinov et al. , etc.) also take into account specific features of protein primary structures. However, it is important to know how closely algorithmic alignments produced through optimization of any chosen target function reflect an evolution-based alignment of the appropriate amino acid sequences, e.g. the one, which juxtaposes the positions in the compared proteins originating from the same position in their common predecessor.
We have analyzed the dependence of the quality (i.e. accuracy and confidence) of the global and local alignments of console sequences versus the following values: (1) the evolutional distance between homologous fragments of sequences (“cores”); (2) the console length; and (3) console asymmetry (“shifted cores”).
The study has revealed regularities allowing for defining more exactly the areas of effective application of every algorithm: when consoles are positioned symmetrically, the global algorithm is more resistant to increasing evolutional distance and console length than the local algorithm (about 10% accuracy and about 8% confidence at 120PAM and up to 20% accuracy and confidence at 240PAM); quite the opposite, when consoles are asymmetrical, the local algorithm is more resistant to increasing evolutional distance and console length than the global algorithm. The boundary of the global algorithm preference is determined roughly by the value of asymmetrical position of homologous fragments of sequences (cores) at which the reference alignment density is almost equal to the density of random sequence alignment. The mean divergence of 5 ÷ 10%, which is typical both of accuracy and confidence of global and local alignments at a symmetrical position of cores, preconditions the developing of a combined method for making a more reliable alignment.
The authors declare that they have no competing interests.
VOP performed all computations, participated in planning the research and wrote the manuscript. MAR and VGT designed the study and contributed to the manuscript. All authors fulfilled the analysis of the results, read and approved the final manuscript.