Date Published: June 14, 2012
Publisher: Public Library of Science
Author(s): Yong-Zi Chen, Zhen Chen, Yu-Ai Gong, Guoguang Ying, John Parkinson. http://doi.org/10.1371/journal.pone.0039195
Sumoylation is one of the most essential mechanisms of reversible protein post-translational modifications and is a crucial biochemical process in the regulation of a variety of important biological functions. Sumoylation is also closely involved in various human diseases. The accurate computational identification of sumoylation sites in protein sequences aids in experimental design and mechanistic research in cellular biology. In this study, we introduced amino acid hydrophobicity as a parameter into a traditional binary encoding scheme and developed a novel sumoylation site prediction tool termed SUMOhydro. With the assistance of a support vector machine, the proposed method was trained and tested using a stringent non-redundant sumoylation dataset. In a leave-one-out cross-validation, the proposed method yielded an excellent performance with a correlation coefficient, specificity, sensitivity and accuracy equal to 0.690, 98.6%, 71.1% and 97.5%, respectively. In addition, SUMOhydro has been benchmarked against previously described predictors based on an independent dataset, thereby suggesting that the introduction of hydrophobicity as an additional parameter could assist in the prediction of sumoylation sites. Currently, SUMOhydro is freely accessible at http://protein.cau.edu.cn/others/SUMOhydro/.
Sumoylation represents an important class of protein post-translational modifications (PTMs) in which a small ubiquitin-like modifier (SUMO) protein is covalently attached to a protein. By adding a SUMO protein to a substrate in a sequence-specific manner, protein sumoylation has the capacity to regulate multiple biochemical properties of the protein target, such as the stability, activity, intracellular localization and protein interactions. As such, sumoylation can play a critical functional role in various biological processes, including gene transcription and signal transduction , , . Because most SUMO substrates are localized in the nucleus, protein sumoylation might have significant effects on nuclear functions , and sumoylation has been shown to be correlated with DNA damage recovery, gene expression and chromosomal integrity , . In addition, the functional importance of protein sumoylation is reflected in a variety of human diseases, including Alzheimer’s disease (AD), Parkinson’s disease (PD) , viral infections  and cancers , .
A competitive sumoylation site predictor termed SUMOhydro was developed in the present study. We included amino acid hydrophobicity in a binary encoding scheme, and this hydrobinary encoding was proven suitable for the prediction of sumoylation sites, which gives SUMOhydro a better level of performance and favorable results relative to previously described predictors. Not only does its ability to clearly characterize amino acids in different positions surrounding a potential sumoylation site, it also pays attention on the biochemical property at different positions. It has been well known that more than two-thirds of the known sumoylation substrates have the consensus motif ψKxE, suggesting that sumoylation targets the substrate proteins at a specific position in most cases. Hence, we choose the position-specific binary encoding as one part of our hydrobinary encoding approach. On the other hand, the hydrophobicity has been proven to play a critical role in sumoylation site recognition. Therefore, the hydrobinary encoding is particularly suitable for the prediction of sumoylation. Although the overall function of this new tool remains unsatisfactory, we expect that the hydrobinary encoding approach reported here will be useful for the further development of more successful sumoylation prediction systems by adopting additional state-of-the-art machine learning methods or by combining this technique with other encoding schemes. The SUMOhydro web server has been constructed to facilitate its use by the biological community, and it is freely accessible at (http://protein.cau.edu.cn/others/SUMOhydro/). In conclusion, this tool has possible applications to proteome-wide sumoylation site prediction.