Research Article: Rapid Design of Knowledge-Based Scoring Potentials for Enrichment of Near-Native Geometries in Protein-Protein Docking

Date Published: January 24, 2017

Publisher: Public Library of Science

Author(s): Alexander Sasse, Sjoerd J. de Vries, Christina E. M. Schindler, Isaure Chauvot de Beauchêne, Martin Zacharias, Heinrich Sticht.


Protein-protein docking protocols aim to predict the structures of protein-protein complexes based on the structure of individual partners. Docking protocols usually include several steps of sampling, clustering, refinement and re-scoring. The scoring step is one of the bottlenecks in the performance of many state-of-the-art protocols. The performance of scoring functions depends on the quality of the generated structures and its coupling to the sampling algorithm. A tool kit, GRADSCOPT (GRid Accelerated Directly SCoring OPTimizing), was designed to allow rapid development and optimization of different knowledge-based scoring potentials for specific objectives in protein-protein docking. Different atomistic and coarse-grained potentials can be created by a grid-accelerated directly scoring dependent Monte-Carlo annealing or by a linear regression optimization. We demonstrate that the scoring functions generated by our approach are similar to or even outperform state-of-the-art scoring functions for predicting near-native solutions. Of additional importance, we find that potentials specifically trained to identify the native bound complex perform rather poorly on identifying acceptable or medium quality (near-native) solutions. In contrast, atomistic long-range contact potentials can increase the average fraction of near-native poses by up to a factor 2.5 in the best scored 1% decoys (compared to existing scoring), emphasizing the need of specific docking potentials for different steps in the docking protocol.

Partial Text

Protein interactions play a key role in almost all biological processes [1][2]. While the number of protein-protein interactions discovered by experimental and computational approaches rises rapidly, the number of known complex structures lags behind [3][4]. However, experimental structure determination methods such as nuclear magnetic resonance (NMR) spectroscopy and X-ray crystallography have been used successfully to determine many of the unbound constituents. Protein docking protocols aim to predict the structure of protein complexes from its unbound components. Docking protocols have been developed for single protein-protein, multiple protein, protein-peptide, protein-RNA and protein-DNA interactions [5][6][7][8]. State-of-the-art docking programs often achieve satisfactory results for sampling near-native docking geometries, particularly for cases with no or little structural changes in each constituent during complex formation [9][10].

Generating knowledge-based scoring potentials with the GRADSCOPT tool kit involves the following steps (Fig 1). First, a benchmark is set up with a sampling protocol generating an ensemble of decoys for each complex in it. The benchmark is then divided into a training and test set of complexes. Secondly, atom or coarse-grained residue types are assigned to the 3D structures of the receptor and the ligand. With respect to this representation and the form of the desired interaction potential, potential-specific feature vectors are calculated for the generated decoys (see Methods subsection “Calculate potential-specific feature vectors”). Subsequently, the parameters of the potential are trained on a subset of decoys from the training complexes by a directly scoring-dependent Monte-Carlo annealing algorithm or by linear regression. Finally, the whole benchmark is re-scored using the feature vectors of the whole ensemble, and afterwards the scoring performance of the generated potential is evaluated on a training-independent test set. This procedure can be performed in parallel or sequentially to generate several distinct scoring potentials in order to find the best suited variant.

The performance of the generated scoring potentials for protein-protein docking showed that both our approaches were able to rapidly create high quality scoring potentials. All our potentials worked significantly better than a random scoring; they even outperformed or competed with two state-of-the-art functions in all three presented assessments. The continuous vdw-potentials performed extremely well in scoring the native structure but poorly for enriching near-native docking solutions. Very similar results were found for the popular Tobi score. These results supported our idea that vdw-potentials are extremely dependant on the distances of interface atoms in their training structures and hence are biased towards certain near-natives or natives in the decoy set.




0 0 vote
Article Rating
Notify of
Inline Feedbacks
View all comments