Date Published: March 01, 2018
Publisher: International Union of Crystallography
Author(s): Oliver S. Smart, Vladimír Horský, Swanand Gore, Radka Svobodová Vařeková, Veronika Bendová, Gerard J. Kleywegt, Sameer Velankar.
Better metrics are required to be able to assess small-molecule ligands in macromolecular structures in Worldwide Protein Data Bank validation reports. The local ligand density fit (LLDF) score currently used to assess ligand electron-density fit outliers produces a substantial number of false positives and false negatives.
The quality of small-molecule ligands in Protein Data Bank (PDB) entries has been, and continues to be, a matter of concern for many investigators (Kleywegt & Jones, 1998 ▸; Kleywegt et al., 2003 ▸; Kleywegt, 2007 ▸; Davis et al., 2008 ▸; Liebeschuetz et al., 2012 ▸; Pozharski et al., 2013 ▸; Smart & Bricogne, 2015 ▸; Deller & Rupp, 2015 ▸). Correctly interpreting whether electron density observed in a binding site is compatible with the soaked ligand or represents water or buffer molecules is sometimes far from trivial. It is particularly challenging when ligands are relatively small or bind with only partial occupancy (Pearce et al., 2017 ▸). Low-resolution structures also tend to be more problematic to interpret unambiguously, particularly below 3 Å resolution, where any waters mediating interactions between ligand and protein are unlikely to be clearly observed. Furthermore, fitting a ligand into electron density and subsequently refining the model so that it has reasonable stereochemistry, while also fitting the experimental data well, can be challenging, particularly for inexperienced crystallographers (Smart & Bricogne, 2015 ▸). The details of ligand binding are often of crucial importance to the use of a structure, for instance for structure-guided drug discovery (Scapin et al., 2015 ▸). This makes it important to establish dependable metrics that can be used to assess whether a ligand modelled with a macromolecular structure can be relied upon.
Analysis of the distribution of ligand-specific metrics reported in the VR was initially performed using the ValTrendsDB website (http://ncbr.muni.cz/ValTrendsDB). A current limitation of ValTrendsDB is that analysis is performed per PDB entry, with all ligand metric values for that entry being averaged. To get around this limitation, further analysis was performed on an individual ligand basis using NumPy (http://www.numpy.org) and Matplotlib (https://matplotlib.org/) to plot graphs. The Jupyter Notebook (https://jupyter.org; Shen, 2014 ▸) for the analysis is included in the Supporting Information.
To assess ligand geometry, the wwPDB validation pipeline uses the Mogul program (Bruno et al., 2004 ▸) from the Cambridge Crystallographic Data Centre (CCDC). For each bond length and bond angle in the ligand, a search is performed for small-molecule crystal structures in the Cambridge Structural Database (CSD) that have a similar chemical environment.
In addition to assessing the geometric quality of a ligand modelled in a protein, it is crucial to assess whether the electron density supports the placement (that is the presence, location, orientation and conformation) of the ligand (Kleywegt, 2007 ▸; Davis et al., 2008 ▸; Pozharski et al., 2013 ▸; Smart & Bricogne, 2015 ▸; Adams et al., 2016 ▸). It should be noted that the deposition of X-ray structure-factor data only became mandatory in 2008 (Berman et al., 2013 ▸). Because of this, it is not possible to calculate electron-density maps for the 10 409 X-ray PDB entries that were deposited before 2008 without structure-factor data. In these cases, validation is necessarily limited to geometric criteria.
The PDB is a treasure trove of data on the interactions between small-molecule ligands and macromolecules. The assessment of ligand geometry using Mogul in the VRs has increased awareness of the issues with ligand geometry, but further work is required to clearly present Mogul validation information. The reports also attempt to help with the assessment of electron density for bound molecules and the electron-density model fit quality by providing the LLDF, RSCC and RSR metrics. Our analysis shows that the LLDF metric has drawbacks and is not a reliable metric in several scenarios: (i) for high-resolution structures when all the residues in a binding site have a very good fit to the density and similar numerical values for RSR, (ii) when the electron-density fit for both the ligand and the surrounding residues is poor and (iii) when the ligand has only a small number of surrounding polymeric residues. In such cases, both false positives (good ligands listed as outliers) and false negatives (ligands of questionable quality not identified as outliers) may occur.