Date Published: March 01, 2020
Publisher: International Union of Crystallography
Author(s): Rafael Junqueira Borges, Kathrin Meindl, Josep Triviño, Massimo Sammito, Ana Medina, Claudia Millán, Martin Alcorlo, Juan A. Hermoso, Marcos Roberto de Mattos Fontes, Isabel Usón.
When phasing cannot be accomplished from a partial polyalanine starting model, extending the model with side chains in a multi-solution way may succeed. SEQUENCE SLIDER implements this approach for use in ARCIMBOLDO.
Molecular replacement (MR) is nowadays the most prevalent method of addressing the crystallographic ‘phase problem’ by approximating the phases with those derived from a homologous protein of known structure placed into the target unit cell (Rossmann & Blow, 1962 ▸). The implementation of more sensitive and accurate maximum-likelihood targets in MR (Read, 2001 ▸) allowed the advent of fragment-based methods, which are between ab initio phasing (Usón & Sheldrick, 1999 ▸) and MR. Common secondary-structure or tertiary-structure fragments are used; thus, no specific structural knowledge of the target structure is required, but MR methods are needed for correct placement. It is then necessary to extend from the partial structure composed of the fragments to a fairly complete and thus interpretable structure. Early methods explored the use of model α-helices (Glykos & Kokkinidis, 2003 ▸; Rodríguez et al., 2009 ▸) and RNA secondary-structure elements, combining manual map inspection, refinement, density modification and composite OMIT maps (Robertson & Scott, 2008 ▸; Robertson et al., 2010 ▸). Currently, a number of pipelines implement fragment-based phasing, relying on the rotation (Storoni et al., 2004 ▸) and translation (McCoy et al., 2005 ▸) functions in Phaser to locate small, yet very accurate fragments. Sometimes, even if Phaser produces correct solutions, distinguishing them among many false solutions may not be possible as the expected log-likelihood gain (eLLG) that they would be expected to render if correctly placed is inconclusive. For small search models, correct and incorrect solutions are frequently characterized by similar figures of merit. Thus, many hypotheses are pursued in parallel and success in extending some of them into a full solution serves to identify the correct solutions.
Extending partial polyalanine solutions with side chains modelled covering a range of possible assignments may allow the solution of partial solutions from ARCIMBOLDO that would otherwise fail. The procedure implemented in SEQUENCE SLIDER involves deriving possible hypotheses compatible with prior information, generating the extended models and refining them. The previous knowledge used is the alignment to a homolog if available and/or the secondary-structure prediction. LLG scoring is used both to guide the choice of fragment to be extended and to select the refined models to be combined in a fresh round of fragment extension. Models with random sequence assignment are generated and included in the pool to provide a baseline. In no case did such models lead to a solution. When no clear path for model completion is apparent, the models are subject to expansion through density modification and autotracing, and solutions can be recognized by the CC of the final traces. In simpler cases, a light version extending every fragment as polyserine may suffice, whereas in challenging cases a finer side-chain assignment is required. This can be extended to all side chains or limited to hydrophobic residues, which tend to have lower B factors and favour fewer rotamers than polar side chains. The SEQUENCE SLIDER method, which is available through the ARCIMBOLDO distribution, has been instrumental in solving new protein structures.