Date Published: February 01, 2018
Publisher: International Union of Crystallography
Author(s): Isabel Usón, George M. Sheldrick.
Experimental phasing of macromolecular crystals is described and explained, with the emphasis on its implementation in the programs SHELXC, SHELXD and SHELXE, which are also used in a number of macromolecular structure-solution pipelines.
Small-molecule structures are routinely solved by so-called direct methods (Usón & Sheldrick, 1999 ▸; Sheldrick et al., 2011 ▸; Giacovazzo, 2014 ▸; Sheldrick, 2015 ▸), but this requires diffraction data to atomic resolution (1.2 Å or better) for typical organic structures. Occasionally, when no model is available for molecular replacement, the anomalous signal is too weak and it is not possible to soak in heavy atoms, it is also necessary to solve macromolecular structures by pure direct methods. A recent example was the solution of the parallel double-helix RNA structure of polyadenosine (Safaee et al., 2013 ▸), confirming a 50-year-old prediction by Watson and Crick based on rather diffuse fibre-diffraction photographs (Rich et al., 1961 ▸). For a number of years, the MAD (multi-wavelength anomalous diffraction) method (Hendrickson, 1985 ▸; Hendrickson et al., 1985 ▸), exploiting small intensity differences of Friedel opposites and of reflections measured at different wavelengths, e.g. for selenomethionine derivatives or proteins in which elements such as iron or zinc are naturally present, was the experimental phasing method of choice. More recently, improvements in software and data quality have made it easier to solve structures by the SAD (single-wavelength anomalous diffraction) method in favourable cases using very weak anomalous scatterers, such as sulfur, that are present naturally in many proteins.
The generation of a polyalanine trace in SHELXE is designed to improve the phases but does not include identification of the individual amino-acid residues and matching the sequence as performed in ARP/wARP (Lamzin et al., 1999 ▸), RESOLVE (Terwilliger, 2003 ▸) and Buccaneer (Cowtan, 2006 ▸).
Six structures (Table 2 ▸) have been used to test and illustrate the new features in SHELXE as described above. SAD data for apoferritin and titin protein A168-A169 are from Mueller-Dieckmann et al. (2007 ▸) and those for fibronectin are from Rudiño-Piñera et al. (2007 ▸); Kgp prodomain (Pomowski et al., 2017 ▸) is a difficult all-β protein with a low solvent content. The C-terminal domain of autophagy-related protein 38 (atg38; Ohashi et al., 2016 ▸) and human synaptonemal complex protein 3 (SYCP3; Syrjanen et al., 2014 ▸) are large coiled-coil proteins, which for the purpose of this study were phased from SAD data at resolutions of 2.4 Å even when originally solved by MAD or SIRAS. The anomalous scatterer substructures summarized in Table 2 ▸ were located with SHELXD. As the autotracing algorithm starts from random seeds entailing an inherent variability, for each of these structures 20 SHELXE jobs were run, varying the time parameter (-t) to obtain 20 different sets of seeds. The results are plotted in Fig. 4 ▸, displaying the weighted mean phase error (MPE) characterizing the final electron-density map obtained for each of the six proteins with the various tracing algorithms and otherwise equivalent parameterization. Fig. 5 ▸ shows the main-chain coverage of the trace.
This paper provides an overview of experimental phasing using SHELXC, SHELXD and SHELXE, concentrating on SAD phasing, which is currently the most popular form of experimental phasing. It describes various improvements in the algorithms that can make the difference between success and failure in borderline cases. A number of innovations have been added to SHELXE in the main-chain tracing and are designed to improve the performance at lower resolution and for all-β structures. Evaluation of the accuracy of the polypeptide traced after each cycle shows that the main improvement of the constrained algorithm is a reduction in the number of false traces rather than an immediate increase in the number of correct traces. This builds up in the following cycles as the map improves, whereas if there are too many poor traces in the initial cycles it may not be possible to recover. In general, a CC of 25% or higher for the main-chain trace against native data to 2.5 Å resolution or better indicates a successful solution.