Date Published: June 01, 2018
Publisher: International Union of Crystallography
Author(s): Tristan Ian Croll.
ISOLDE is an interactive molecular-dynamics environment for rebuilding models against experimental cryo-EM or crystallographic maps. Analysis of its results reinforces the need for great care when validating models built into low-resolution data.
As the resolution of a crystallographic or cryo-EM data set degrades, the challenge faced by the model builder increases rapidly as first individual atoms, then small bonded groups and eventually entire residues become effectively unidentifiable from the density alone. The difficulty is further compounded by the fact that low-resolution structures often tend to also be large structures (Supplementary Fig. S1), with thousands or even tens of thousands of residues to contend with. It is unsurprising, then, that the rate of residual errors in published structures similarly grows steeply with decreasing resolution. This fact has long been recognized (Kleywegt & Jones, 1995 ▸), and over the past two decades it has been common to see 3–4 Å resolution structures published with outlier rates 1–2 orders of magnitude higher than would be expected from atomic resolution structures (Croll & Andersen, 2016 ▸). While standards have improved over time (aided in no small part by an ever-increasing supply of high-resolution structures to mine for reference models), it remains common for novel low-resolution structures (with no useful high-resolution homology templates) to be published with statistics indicating high levels of residual error.
ISOLDE is implemented as a Python 3.6 plugin to UCSF ChimeraX (Goddard et al., 2017 ▸) and can be installed on Linux and Mac operating systems via its ToolShed (Tools/More Tools in the ChimeraX menu). Handling of reciprocal-space data and crystallographic symmetry is provided via a ChimeraX plugin to Clipper-Python (McNicholas et al., 2017 ▸). MD calculations are handled by OpenMM 7.1 (Eastman et al., 2017 ▸) using the AMBER ff14sb force field (Maier et al., 2015 ▸) in GB-Neck2 implicit solvent (Nguyen et al., 2013 ▸) with grid-based protein backbone corrections (Perez et al., 2015 ▸). Preliminary support for three-dimensional haptic interaction via the CHAI3D library (Conti et al., 2003 ▸) is available on request. While a CPU-only implementation is provided, in practice an OpenCL- or CUDA-capable GPU (with all necessary drivers correctly installed) is required for adequate performance. Illustrative benchmarks for two machines with very different capabilities (a MacBook Air using its onboard GPU and a desktop-replacement gaming laptop with a NVIDIA GTX1070 GPU) are provided in the Supporting Information (§S3). The former supports somewhat interactive simulations up to a few thousand atoms (sufficient for small-scale local remodelling tasks) and is capable of non-interactive settling of the entire 60 000-atom MCM-2 complex. The latter allows interactive speeds up to about 20 000 atoms (on the order of 1000 protein residues).
The recently published 3787-residue yeast MCM2-7 heterohexamer (PDB entry 3ja8; Li, Zhai et al., 2015 ▸) was built into 3.8 Å resolution cryo-EM density starting from homology models generated from a distantly related archaeal homohexamer using CHAINSAW (Stein, 2008 ▸), followed by extensive iterations of manual rebuilding in Coot and refinement with phenix.real_space_refine using Ramachandran and rotamer restraints. Particularly given the scale of the challenge, a cursory glance at the validation statistics provided on any of the PDB webservers suggested no serious cause for alarm: while the clashscore of 28 is certainly high, the numbers of Ramachandran outliers (1.1%) and in particular side-chain outliers (0.1%) are very low for a structure of this resolution. Closer inspection, however, revealed a somewhat more problematic reality.
For the purposes of this manuscript, I have demonstrated that the ISOLDE environment combined with an existing refinement package allows a single user, working on a moderately priced workstation, to rebuild a large, low-resolution structure to near-atomic resolution standards in approximately one week of work, without reference to external information such as reference models. This is not intended as a suggestion that such extensive manual interaction is necessary or desirable. In fact, it is likely that a majority of the improvements identified, in particular those that involve simply flipping 1–2 adjacent peptide bonds, should be readily manageable by automated methods such as those recently described for use in moderate-resolution crystal structures (Touw et al., 2015 ▸). In addition to the many possible permutations in the use of ISOLDE with external tools in a larger workflow, there is substantial scope for the automation of various common tasks (and the implementation of existing successful algorithms) using combinations of the various unit operations defined in ISOLDE itself. A simple example of such a combination is the semi-automated shifting of protein residues in register described in §4.3, which is accomplished by the concerted action of many moving position restraints.