Date Published: July 01, 2018
Publisher: International Union of Crystallography
Author(s): Adam J. Simpkin, Felix Simkovic, Jens M. H. Thomas, Martin Savko, Andrey Lebedev, Ville Uski, Charles Ballard, Marcin Wojdyr, Rui Wu, Ruslan Sanishvili, Yibin Xu, María-Natalia Lisa, Alejandro Buschiazzo, William Shepard, Daniel J. Rigden, Ronan M. Keegan.
SIMBAD is a sequence-independent molecular-replacement pipeline for solving difficult molecular-replacement cases where contaminants have been crystallized. It can also be used to find structurally related search models where no obvious homologue can be found through sequence-based searching.
In X-ray crystallography, the problem of solving the three-dimensional structure of a protein remains a difficult task. Even with crystals diffracting to high resolution, many projects flounder owing to the challenges involved in overcoming the phase problem. For macromolecules with more than a few hundred atoms, solving the phase problem directly is currently not viable, so an alternative approach must be used. Molecular replacement (MR) is the most popular route to solve the problem as it is quick, inexpensive and can be highly automated (Evans & McCoy, 2008 ▸; Long et al., 2008 ▸). MR exploits the fact that proteins with similar amino-acid sequences typically form similar three-dimensional structures. Where a known structure has a similar sequence to a target, the phase information from the known structure can, assuming that there is corresponding structural similarity, often be used as a starting point for the phases of the unknown structure. The procedure requires that the known structure is reorientated and positioned correctly in the unit cell of the target. Programs incorporating sophisticated scoring systems such as Phaser (McCoy et al., 2007 ▸) and MOLREP (Vagin & Teplyakov, 2010 ▸) have been developed to perform this task. However, the selection of an appropriate search model remains a limiting factor in MR. Sequence similarity does not always ensure structural similarity, particularly where the similarity is lower than 30% (Krissinel & Henrick, 2004 ▸; Krissinel, 2007 ▸). Some recent studies have sought alternative ways of finding structurally similar search models. Approximating target structures through ab initio modelling and using these as search models has been shown to work by Qian et al. (2007 ▸) and Rigden et al. (2008 ▸) and can be exploited using the AMPLE application (Bibby et al., 2012 ▸). Other approaches make use of idealized fragments or regularly occurring fragments and motifs from known structures as search models in MR. ARCIMBOLDO (Rodríguez et al., 2009 ▸) and Fragon (Jenkins, 2018 ▸) are two developments exploiting this approach. All of these applications mainly rely on small but highly accurate fragments being placed correctly in the unit cell of the target. In the most extreme cases, where data are available to 1 Å resolution or better, it has been shown that it is possible to use single atoms as a successful search model (McCoy et al., 2017 ▸).
SIMBAD has been designed to be used in a range of different scenarios where conventional sequence-based MR methods have failed. So far, SIMBAD has proved to be effective at identifying crystal contaminants, as also have other similar methods such as MarathonMR (Hatti et al., 2016 ▸) and ContaMiner (Hungler et al., 2016 ▸), suggesting that contamination is one of the main reasons that conventional methods fail. Alongside MarathonMR (Hatti et al., 2017 ▸), SIMBAD has also proved effective in cases where crystals have been mislabelled. This can happen for various reasons, especially in multi-laboratory collaborations. SIMBAD has also successfully determined the structures of unsequenced proteins and a case of swapped crystallization trays (data not shown). More ambitiously, SIMBAD also provides a possible means to solve a novel target which is structurally similar to an existing protein in the MoRDa DB but whose relationship to that structure is not apparent by sequence comparisons alone.
Crystal contamination is a possibility that every crystallographer should bear in mind when performing an experiment. SIMBAD provides a rapid and reliable means to check for the presence of a contaminant. SIMBAD is also useful in cases of the misidentification of a crystal and can also be useful in scenarios where no obvious homologue is available as a search model or the most suitable search model is not among those most highly ranked by sequence comparisons. The lattice-parameter and contaminant searches in SIMBAD are very quick, and we therefore suggest running them routinely after data collection on beamlines to identify possible cases of contaminant crystallization or protein mislabelling.