Research Article: Rapid and Accurate Prediction and Scoring of Water Molecules in Protein Binding Sites

Date Published: March 1, 2012

Publisher: Public Library of Science

Author(s): Gregory A. Ross, Garrett M. Morris, Philip C. Biggin, Peter Csermely.


Water plays a critical role in ligand-protein interactions. However, it is still challenging to predict accurately not only where water molecules prefer to bind, but also which of those water molecules might be displaceable. The latter is often seen as a route to optimizing affinity of potential drug candidates. Using a protocol we call WaterDock, we show that the freely available AutoDock Vina tool can be used to predict accurately the binding sites of water molecules. WaterDock was validated using data from X-ray crystallography, neutron diffraction and molecular dynamics simulations and correctly predicted 97% of the water molecules in the test set. In addition, we combined data-mining, heuristic and machine learning techniques to develop probabilistic water molecule classifiers. When applied to WaterDock predictions in the Astex Diverse Set of protein ligand complexes, we could identify whether a water molecule was conserved or displaced to an accuracy of 75%. A second model predicted whether water molecules were displaced by polar groups or by non-polar groups to an accuracy of 80%. These results should prove useful for anyone wishing to undertake rational design of new compounds where the displacement of water molecules is being considered as a route to improved affinity.

Partial Text

Water is a key structural feature of protein-ligand complexes and can form a complex hydrogen-bonding network between ligand and protein [1], [2]. Water-mediated binding is so common that a study of 392 protein-ligand complexes found that 85% had at least one or more water molecules that bridge the interaction between the ligand and the protein [3]. Furthermore, the displacement of an ordered water molecule can drastically affect a ligand’s binding affinity [4], [5]. As a result, it is common to include explicit water molecules in computational drug design [6]–[8]. The careful consideration of hydration sites has been shown to aid the predictability of 3D QSAR models, [9]–[11] ensure stable simulations with molecular dynamics [12], and improve the accuracy of rigorous free energy calculations [13]. Continuum solvent models have also been reported to improve with the addition of explicit water molecules [14]. Traditionally, ordered water molecules were ignored in ligand docking studies and ligands were docked into desolvated binding sites. There are now a number of docking protocols that include explicit water molecules and claim to improve accuracy in many cases [15]–[20]. However, it has also been reported that including such water molecules may hamper efforts to predict a ligand’s correct binding mode [21].