Date Published: August 01, 2019
Publisher: International Union of Crystallography
Author(s): Laurel Jones, Michael Tynes, Paul Smith.
PeakProbe facilitates the automated modelling of ordered solvent in macromolecular crystal structures by analysing features of the electron density and chemical environment surrounding a given coordinate. The extracted data are transformed to a resolution-independent score space and likely solvent models are predicted based on the frequency distributions observed in a large-scale sample of the PDB.
Current techniques in macromolecular X-ray crystallography derive structural information by the construction of a comprehensive model of X-ray scattering components within a crystal system. Besides the integral nucleic and amino-acid polymers found in macromolecular structures, other scattering components include ligands and cofactors associated with these polymers, and both ordered/explicit and bulk solvent. Crystallographic model building relies on reconstructing electron density from Fourier coefficients whose complex components (phases) are ultimately derived in whole or in large part by Fourier transformation of a model of all scattering components. As a result, crystallographic models are best built through an iterative process in which more complete and accurate models result in more accurate phases which provide sharper and more interpretable electron density, which in turn allows the assembly of a further improved model, further improved phases and so on. Thus, arriving at an optimal model for a crystal structure requires the inclusion of all scattering components, including the often numerous ordered small-molecule and solvent species within the crystal lattice (Drenth & Mesters, 2007 ▸).
The central design of PeakProbe focuses on the prediction of a likely solvent model at a given coordinate in a crystal system. Predictions are made by evaluating features extracted from the electron density and local atomic environment of a given point (termed a ‘peak’) and comparing the extracted values with observed distributions. As we conceived PeakProbe to be used during the building as well as the validation stages of modelling, we designed the program specifically with the evaluation (‘probing’) of difference map peaks in mind, thus the name ‘PeakProbe’. Our initial goal was to develop a classifier to distinguish between water and sulfate/phosphate, the next most common solvent species in the PDB after water. Crystallographic methods are not well suited to differentiating between sulfate and phosphate, so we consider the two to be indistinguishable and refer to them collectively as ‘sulfate’ hereafter. The core of the PeakProbe classifier uses two scores that encapsulate the overall sulfate-like nature of the local electron density and the chemical environment of a peak. Taken together, these two scores are able to discriminate between other types of solvent apart from water and sulfate. Specifically, the PeakProbe classifier has been trained to distinguish four classes of solvent: water, sulfate, heterogen and metal. The heterogen class includes other common solvent species with polar or anionic character such as PEG, glycerol, and acetate and chloride ions. The metal class refers specifically to divalent metals such as Mg2+, Ca2+, Zn2+ and Mn2+.
PeakProbe performs many of the functions associated with manually building a comprehensive solvent model for a macromolecular structure. Specifically, the program will (i) identify peak locations associated with possible solvent, (ii) extract features from these peaks, (iii) analyse these features by comparing them with features extracted from known solvent models and (iv) predict a likely solvent model for and evaluate any existing model already associated with the peak. Thus, PeakProbe serves as a prototype for fully automated solvent modelling. The approach taken by PeakProbe specifically addresses the gap in current software between tools for automated water modelling and those used for automated ligand identification and building.