Date Published: March 01, 2020
Publisher: International Union of Crystallography
Author(s): Ana Medina, Josep Triviño, Rafael J. Borges, Claudia Millán, Isabel Usón, Massimo D. Sammito.
ALEPH characterizes the main-chain geometry of small, noncontinuous fragments to flexibly annotate secondary structure, decompose folds, extract libraries and superpose fragments. Secondary and tertiary structure are described through networks of characteristic vectors, which are defined between the centroids of the Cα and carbonyl O atoms in a peptide.
Secondary-structure properties are usually derived from the hydrogen-bond pattern. They were predicted even before the structures of full proteins had been determined (Pauling et al., 1951 ▸; Pauling & Corey, 1951 ▸). Analysing this network implies assessment of the environment of the amino acid in a peptide, made up of nonconsecutive residues, which may encompass symmetry equivalents that are not explicitly contained in the PDB set of coordinates. The formation of these hydrogen bonds and the planarity of the peptide bond restrict the protein backbone to adopting torsion-angle values in characteristic ranges, corresponding to the most populated areas of the Ramachandran plot (Ramachandran et al., 1963 ▸). Conversely, the analysis of the relevant torsion angles may suffice to characterize the secondary structure. Definition of Secondary Structure of Protein (DSSP) is the standard algorithm employed for the prediction of hydrogen positions and bonds, from which the secondary-structure environment for each residue can be derived (Kabsch & Sander, 1983 ▸; Touw et al., 2015 ▸). Distortions in the polypeptide chain are sometimes encountered, and especially when the resolution falls below 3–3.5 Å (Headd et al., 2012 ▸; Karmali et al., 2009 ▸) some structures may fail to meet DSSP regularity. DipSpace (Pereira & Lamzin, 2017 ▸) embeds geometrical information about the backbone atoms around each Cα atom in its dipeptide-unit environment, which is described as a matrix of the interatomic distances. Also, CaBLAM (Richardson et al., 2018 ▸) defines a novel parameter space of Cα–Cα and CO–CO virtual dihedrals, where the CO dimension diagnoses large distortions of peptide orientation at low resolution and the two Cα dimensions identify the probable secondary structure obscured by these problems. CaBLAM is designed for structure validation to detect errors in the model, whereby poor geometry introduces ambiguity.
Recent developments in MR have formally bound the solvability of the phase problem to an estimated LLG (eLLG; McCoy et al., 2017 ▸), allowing the minimum fractional scattering that is needed at a given accuracy to be established a priori (Oeffner et al., 2013 ▸). The eLLG score is used in the fragment-based MR approach ARCIMBOLDO to guide the difficult trade-off between fragment generality and solution discrimination (Oeffner et al., 2018 ▸). While minimal fragments, such as simple secondary-structure elements, are ubiquitous across structures, their correct location usually renders a low signal. Small local folds, defined as composite sets of discontinuous secondary-structure elements (for example, three antiparallel β-strands facing two parallel helices), are still ubiquitous across different families of structures but, unlike α-helices, cannot be represented accurately enough through a single model that will match the corresponding geometry in most unknown target structures. In this context, we developed ALEPH as a bioinformatics tool to prepare libraries representing variations of a given fold for MR. The extraction of such libraries is performed without relying on sequences and alignments to allow searches across different families.
Some of the libraries previously created with ALEPH are distributed with CCP4 for use as input search models in ARCIMBOLDO_BORGES. Recently, new libraries exploring more complex folds have been prepared and are available through our webpage. Table 2 ▸ lists the currently available libraries.
This work introduces the new software ALEPH, a graph-based tool to annotate secondary and tertiary structure from coordinates, decompose a structure into compact local small folds, extract local folds from a database of structures without using the sequence and generate libraries of such folds, which are especially useful as input search models for fragment-based MR.