Research Article: Challenge data set for macromolecular multi-microcrystallography

Date Published: February 01, 2019

Publisher: International Union of Crystallography

Author(s): James M. Holton.


Synthetic macromolecular crystallography diffraction-image data were generated to demonstrate the challenges of combining data from multiple crystals with indexing ambiguity in the context of heavy radiation damage. The nature of the problems encountered using contemporary data-processing programs is summarized.

Partial Text

Data sets that challenge the capabilities of modern structure-solution procedures, algorithms and software are difficult for developers to obtain for a very simple reason: as soon as a solution is reached, the data set is no longer considered to be challenging. Data sets that are recalcitrant to current approaches are also not available in public databases such as the Protein Data Bank (Berman et al., 2002 ▸) or image repositories (Grabowski et al., 2016 ▸; Morin et al., 2013 ▸) that only contain data used for solved structures. When testing the limits of software, it is generally much more useful to know ahead of time what the correct result will be. This enables the detection and optimization of partially successful solutions at every point in the process, even if downstream procedures fail.

In order to demonstrate the utility of this challenge, some discussion of the difficulties encountered when trying to solve the structure using MOSFLM (Leslie & Powell, 2007 ▸), LABELIT (Sauter & Poon, 2010 ▸), HKL-2000 (Otwinowski & Minor, 1997 ▸), XDS/XSCALE (Kabsch, 2010 ▸), DIALS (Winter et al., 2018 ▸), PHENIX (Adams et al., 2010 ▸), the CCP4 suite (Winn, 2003 ▸) and BLEND (Foadi et al., 2013 ▸) is provided here. Specific bugs and program-to-program differences will not be detailed here as software is continuously improving and contemporary shortcomings have little archival value, but the algorithmic challenge of simultaneous speed and robustness will be evaluated. The performance of particular programs with this data set is best described by their authors, such as Gildea & Winter (2018 ▸).

The challenges to macromolecular structure determination using data from a large number of small crystals lie primarily in the combinatorial nature of the data analysis. Recent landmark achievements such as those reported by Brehm & Diederichs (2014 ▸), Liu & Spence (2014 ▸), Gildea & Winter (2018 ▸), Diederichs (2016 ▸, 2017 ▸) and, in this issue, Foos et al. (2019 ▸) represent important mathematical advances in handing this problem and significant practical progress towards solving the present challenge. The indexing-ambiguity problem itself may now be regarded as solved, with the proviso that current approaches are still vulnerable to incorrect lattice assignment, such as cell doubling, and radiation-damage cutoffs during processing. These choices are still up to the user, and since the correct choice is generally not clear until the structure has been solved, the only robust strategy remains an exhaustive evaluation of all possible lattice-type and damage-cutoff options. By ‘cheating’ this work was able to solve the challenge structure using only the first 36 crystals of the 100 presented, and further work that can approach or surpass this number without cheating will directly translate to real-world projects finishing earlier and using fewer difficult-to-produce isomorphous crystalline samples.




Leave a Reply

Your email address will not be published.