Date Published: February 25, 2013
Publisher: BioMed Central
Author(s): Xuefeng Cui, Shuai Cheng Li, Dongbo Bu, Babak Alipanahi, Ming Li.
Previous studies show that the same type of bond lengths and angles fit Gaussian distributions well with small standard deviations on high resolution protein structure data. The mean values of these Gaussian distributions have been widely used as ideal bond lengths and angles in bioinformatics. However, we are not aware of any research done to evaluate how accurately we can model protein structures with dihedral angles and ideal bond lengths and angles.
When studying the functions of a protein, it is crucial to know the three-dimensional structure consisting of the Cartesian coordinates of all the atoms of the protein. These atoms are bonded together by inter-atomic forces called chemical bonds. It has been observed that the bond lengths and angles of the same type assume a Gaussian distribution with a small standard deviation (STDEV) in high resolution protein structure data. Typically, the bond lengths on protein backbones have STDEVs between 0.019Å and 0.033Å and the bond angles on protein backbones have STDEVs between 1.5° and 2.7° [1,2]. These results suggest the possibility of modeling protein structures with the mean values of bond lengths and angles, which are often referred to as ideal values.
Given the target protein backbone structure, we would like to find the optimal idealized backbone structure. For an idealized protein backbone structure, the coordinates of O, H and Cβ backbone atoms can be calculated from the coordinates of n, Cα and C backbone atoms. Thus, we specifically describe how to generate coordinates of n, Cα and C atoms in this section. For simplicity, a structure is always referred to as a protein backbone structure unless strictly specified.
After the backbone structure of the target protein has been idealized, we begin to idealize the side-chain structures. When doing this, the idealized backbone structure is considered to be rigid. This approach is widely accepted because previous research suggests that the backbone conformation is archived before the side-chain conformations are archived . After the side-chain idealization, we should have a complete idealized protein structure with all of the backbone and the side-chain structures idealized.
To study the protein structure idealization problem and its applications, we implemente our protein structure idealization algorithm. In our implementation, we use the mean bond lengths and angles that had been reported in  as the ideal bond lengths and angles, respectively. When idealizing the protein backbone structure, we set the search space radius of an atom as r=1.6Å and the discrete grid size as ε=r/5. We find that m=50,000 had a reasonable balance between speed and accuracy. When idealizing the protein side-chain structure, we set the search space of a rotamer dihedral angle to be within 3σ distance from the mean value, where σ is the STDEV of the rotamer dihedral angle, and we set the discrete grid size to be 10°. We also refine the idealized structure by iteratively reducing the search space and the discrete grid size by a constant factor of 0.5. Since finding the best scoring function for the protein structure idealization is out of the scope of this paper, we set all weights wa=1.0 for all a in our scoring function.
We have introduced the protein structure idealization problem and performed our first attempt to solve it. The experiment results show that idealized structures always exist with small changes on the coordinates. Furthermore, the idealized backbone structures have significantly better free energy and (Φ,Ψ) dihedral angle distributions. Therefore, protein structures can be modeled accurately with dihedral angles and ideal bond lengths and angles, and it is feasible to predict protein backbone and side-chain structures by searching the dihedral angle space.
The authors declare that they have no competing interests.
The algorithm implementation and all experiments were done by XC; the protein structure idealization algorithm was designed by XC and SCL; the CULLPDB_PC30_RES1.6_R0.25 experiment was designed by XC, SCL and DB; the NMR experiment was designed by XC and BA; the project was directed by ML; All authors read and approved the final manuscript.