Date Published: February 15, 2012
Publisher: BioMed Central
Author(s): Jafar Razmara, Safaai Deris, Sepideh Parvizpour.
In structural biology, similarity analysis of protein structure is a crucial step in studying the relationship between proteins. Despite the considerable number of techniques that have been explored within the past two decades, the development of new alternative methods is still an active research area due to the need for high performance tools.
In this paper, we present TS-AMIR, a Topology String Alignment Method for Intensive Rapid comparison of protein structures. The proposed method works in two stages: In the first stage, the method generates a topology string based on the geometric details of secondary structure elements, and then, utilizes an n-gram modelling technique over entropy concept to capture similarities in these strings. This initial correspondence map between secondary structure elements is submitted to the second stage in order to obtain the alignment at the residue level. Applying the Kabsch method, a heuristic step-by-step algorithm is adopted in the second stage to align the residues, resulting in an optimal rotation matrix and minimized RMSD. The performance of the method was assessed in different information retrieval tests and the results were compared with those of CE and TM-align, representing two geometrical tools, and YAKUSA, 3D-BLAST and SARST as three representatives of linear encoding schemes. It is shown that the method obtains a high running speed similar to that of the linear encoding schemes. In addition, the method runs about 800 and 7200 times faster than TM-align and CE respectively, while maintaining a competitive accuracy with TM-align and CE.
The experimental results demonstrate that linear encoding techniques are capable of reaching the same high degree of accuracy as that achieved by geometrical methods, while generally running hundreds of times faster than conventional programs.
Today, biologists are faced with rapidly growing amounts of unknown sequence and structure data related to protein databases. Taking advantage of efficient analysis tools, biologists are highly motivated to derive biological insights from these biomolecules. Sequence comparison tools are commonly used to determine the similarities between proteins with a high degree of similarity, whereas structure comparison methods are essentially utilized to highlight the evolutionary relationships among proteins. Additionally, scientists consider the biological role for these macromolecules as being strongly dependent on their 3D-structure, which has attracted their interest to employ accurate and reliable structure comparison tools with respect to such molecules.
The TS-AMIR algorithm works in two stages. In the first stage, a correspondence map is made between secondary structure elements of two compared structures using text modelling techniques. The procedure makes a topology string based on the geometry of the secondary structure elements of each structure followed by the application of the n-gram modelling technique to find the best matching condition between two structures. The second stage uses a heuristic step-by-step algorithm to make an alignment at the residue level by calculating a rotation matrix derived from applying the method suggested by Kabsch [26,27]. Detailed explanations of the method are described in the following sections.
The above introduced algorithm, called TS-AMIR, was implemented in Microsoft Visual C++ using MS-Windows XP. This section reports the results of experiments in order to assess the performance of the method. The method was subjected to different datasets and its outputs were compared with CE  and TM-align  representing two powerful geometrical methods and YAKUSA , 3D-BLAST  and SARST  as three well-known linear encoding methods.
We have developed a rapid protein structure alignment tool called TS-AMIR, a Topology String Alignment Method for Intensive Rapid Protein Structure Comparison, which is a combination of a linear encoding scheme in the first stage and a geometry based technique in the second stage. In terms of speed, the experimental results demonstrate the high performance of the method as it performs as well as linear encoding schemes. In addition, the method obtains results as highly accurate as the geometry based approaches. This high efficiency results from the simple and efficient techniques which are employed by the method.
The authors declare that they have no competing interests.
JR carried out the design and development of the algorithm, performed the experiments and their statistical analysis, and drafted the manuscript. SD carried out the design and supervision of the study. SP participated in data collection and analysis, and helped to draft the manuscript. All authors read and approved the final manuscript.