Date Published: February 01, 2017
Publisher: International Union of Crystallography
Author(s): Roberto A. Steiner, Julie A. Tucker.
An overview of the process of ligand restraint generation for macromolecular crystallographic refinement is given.
The limited resolution at which macromolecular crystals typically diffract does not allow crystallographic refinement to be carried out using solely X-ray diffraction data. Prior knowledge, often in the form of stereochemical restraints, also needs to be taken into account to achieve chemically plausible structures (Evans, 2007 ▸). Macromolecular refinement packages thus minimize a target function with two components: a component utilizing geometry (or prior knowledge) and a component utilizing experimental X-ray knowledge,where ftotal is the total target function to be minimized, consisting of functions controlling the geometry of the model (fgeom) and the fit of the model parameters to the experimental data (fX-ray), and w is a weight between the relative contributions of these two components. Optimization routines are available in most packages that allow an automatic selection of w. From a Bayesian viewpoint, these functions have the following probabilistic interpretation:A number of research articles describe these functions in detail together with their implementation in the various refinement packages available as well as the mathematical tools to minimize ftotal. In the case of REFMAC5, the software provided with the CCP4 suite, the reader is encouraged to consult the following articles: Murshudov et al. (1997 ▸, 1999 ▸, 2011 ▸), Nicholls et al. (2012 ▸), Skubák et al. (2004 ▸, 2009 ▸), Steiner et al. (2003 ▸) and Vagin et al. (2004 ▸).
In general terms, the process of generating a set of restraints, or ‘dictionary’, for a small molecule involves (i) taking a description of the molecule as an input, (ii) processing its description to derive atom energy types and connectivities, and finally (iii) using this information to generate an idealized set of coordinates to allow fitting of the ligand to electron density and a list of geometric restraints with associated weights to allow the fitted ligand to be refined (Fig. 1 ▸). Each program uses different approaches to achieve these latter two steps and these will be covered in more detail in §3. Firstly, we will discuss the possible types of input to, and output from, a dictionary-generation program, and illustrate the importance of providing an appropriate molecular description. We will use a hypothetical molecule, which we have called chimerin1 (Fig. 2 ▸), to illustrate the principles of the dictionary-generation process.
Chimerin1 has 29 atoms, of which 21 are heavy atoms (i.e. non-H), and it can be described using 31 bonds, 51 angles, 19 dihedrals (or torsions), one chiral centre and at least two planar restraints. These restraint types are illustrated diagrammatically in Fig. 2 ▸(a). One could write out the restraints for chimerin1 by hand, and historically that is how dictionaries were constructed; however, as the size and complexity of a novel molecule increases, this rapidly becomes unmanageable. Even for a relatively small molecule getting the chemistry right can be nontrivial.
Dictionary-generator output should be viewed as a starting point, which will likely evolve during the refinement and model-building process (see, for example, Bax et al., 2017 ▸; Agrawal et al., 2013 ▸; Chan et al., 2015 ▸). One way to check the refined or idealized coordinate geometry (and thereby the dictionary) is to use the Cambridge Crystallographic Data Centre (CCDC) software Mogul (Bruno et al., 2004 ▸) to search against the small-molecule data in the CSD. Tools for doing this are now available in Coot (Emsley, 2017 ▸) and through the PDB Validation Server (Adams et al., 2016 ▸). The version of chimerin1 generated using ACEDRG shows overall a good agreement with the data in the CSD, as reflected in the low root-mean-square Z (r.m.s.Z) values for bond lengths and angles (Table 4 ▸). Two bonds and six angles are, however, flagged as being unusual; the bond and angle outliers with the highest Z-score are indicated in Fig. 2 ▸(e) (labelled A1 and B1, respectively). Several torsion (or dihedral) angles are also flagged; T1 in Fig. 2 ▸(e) had the largest dmin value. This torsion angle is quite variable across the output coordinates shown in Fig. 4 ▸(a), likely reflecting differences in the conformer/coordinate-generation methods used by the various programs. Interestingly, three angles and four torsions in chimerin1 are not represented in the CSD, and several others are represented by fewer than five examples; a consequence of the novel chemistry of our hypothetical example molecule.
In summary, a number of ligand dictionary generators are now available, with more in development. They support multiple input and output formats, and use a variety of approaches, both empirical and theoretical, to derive restraint information. Each has its own features and limitations, and all will provide a good starting point for further manual intervention and iterative improvement as knowledge of the small-molecule properties within the macromolecular complex become clearer during refinement.
The following reference is cited in the Supporting Information for this article: R Core Team (2015 ▸).