Date Published: October 01, 2019
Publisher: International Union of Crystallography
Author(s): D. Maluenda, T. Majtner, P. Horvath, J. L. Vilas, A. Jiménez-Moreno, J. Mota, E. Ramírez-Aportela, R. Sánchez-García, P. Conesa, L. del Caño, Y. Rancel, Y. Fonseca, M. Martínez, G. Sharov, C.A. García, D. Strelak, R. Melero, R. Marabini, J. M. Carazo, C. O. S. Sorzano.
The Scipion framework allows very flexible image-processing workflows to be generated and employed at electron-microscopy facilities, such that image acquisition can be monitored and possible problems detected, thereby enabling early decisions to be made on the fly. The streaming workflow can be very simple or extended, permitting the data resolution and heterogeneity to be estimated and adapted to the desire of the user and the microscope operator.
Electron microscopy (EM) has become an established technique to define the three-dimensional structure of biological macromolecules (Frank, 2017 ▸). Owing to the high cost of the electron microscope itself, with all its components (direct electron detector camera, phase plates, spherical aberration correctors etc.), the current trend is to build large EM facilities that concentrate high-end machines and that offer their services to a large community of users. In such circumstances, it is advisable for the users to have previously screened the quality of their samples in more modest electron-microscopy setups.
The electron microscope takes a collection of images of each field of view with very short exposure times; each image is called a frame. One of the key advances in the field was the realization that the sample was not static in space, but rather that it was moving (Brilot et al., 2012 ▸). For this reason, frames must be aligned before they can be averaged into an electron micrograph. The signal-to-noise ratio (SNR) of these frames is extremely low (between 1/200 and 1/5000), such that the alignment algorithms must be extremely robust to noise and they must tolerate incorrect estimates of the alignment between any two frames. To perform this task, Scipion enables movie alignment while streaming through Xmipp Correlation, Unblur and Summovie (Campbell et al., 2012 ▸; Grant & Grigorieff, 2015 ▸), MotionCor2 (Zheng et al., 2017 ▸) and Xmipp Optical flow alignment (Abrishami et al., 2015 ▸). We can consider these algorithms to be estimators of the deformation field between each of the frames and the final micrograph. In a way, we can assimilate the global alignment programs (Xmipp Correlation and Unblur) as Taylor zero-order estimates of these deformation fields. MotionCor2 allows parabolic deformation, which could be assimilated into a second-order estimate, and the Xmipp optical flow can be assimilated into a higher order estimate in which each pixel in the frames can move freely in any direction (with some regularization to ensure the smoothness of the deformation field). With the exception of MotionCor2, the programs have difficulties in following real-time processing using a single CPU, principally because their processing time may be longer than the acquisition time. However, there is no problem if multiple CPUs are available (depending on the data size, four or eight CPUs are normally sufficient), and this is certainly not a limitation if the alignment jobs are submitted to a cluster (for example, through queuing). Scipion streaming execution automatically handles the jobs that are finished and that are ready for the next step in processing.
The stream processing described in the previous section used to be the only stream processing performed at the EM facility, and as such it can be regarded as a characterization and monitoring of the ‘functioning’ of the machine. At the end of the acquisition period, the user is given a report indicating the number of micrographs acquired and some statistics about the defocusing values, alignment shifts and expected resolution (from the CTF point of view). However, this analysis is not especially informative about the quality of the sample itself. To obtain a better sample analysis, image-processing packages now continue with the image processing of subsequent steps. Therefore, the next step is to find particles in the micrographs, which can be performed in four different ways using the programs available in Scipion v.2.0.(i) By looking for objects of a given size [SPARX Gaussian picker (Hohn et al., 2007 ▸), RELION Gaussian picking (Scheres, 2014 ▸) and Appion DoG picker (Voss et al., 2009 ▸)].(ii) By using a picker trained to select a variety of micrographs (Sphire-crYOLO; Wagner et al., 2019 ▸).(iii) By learning from the kind of particles to select (Xmipp auto-picking; Abrishami et al., 2013 ▸).(iv) By using templates to match areas in the micrographs, where these templates may come from a 2D analysis of the first micrographs in which the particles have been manually selected, may be selected by any other template-free picker or by generating projections from a structure similar to that under study (Gautomatch and RELION reference-based picker; Scheres, 2014 ▸).
Once we have a streaming set of particles, the standard image-processing workflow proceeds with the 2D classification to fulfill a series of objectives.(i) To compress the set of several thousand projection images with a very low SNR into a comprehensible set of 2D averages with a higher SNR.(ii) To identify incorrectly selected images [images characterized as different from the images belonging to the core or the stable core of the class (Sorzano et al., 2014 ▸), or those assigned to a 2D class whose representative does not correspond to a centered projection of the structure under study (i.e. an artifact), that is in the middle of two particles or that corresponds to an empty region of the micrograph].(iii) To evaluate the acquisition quality through the frequency content of the 2D averages (good acquisitions normally generate 2D classes with a very high frequency content, while acquisitions that are limited in their resolution for any reason, or that suffer image-alignment problems, produce poorly resolved 2D classes).(iv) To evaluate the quality of the region currently being imaged.
In some projects, the initial volume is known from the very beginning, before any image processing is performed (for instance, when studying the structure of a macromolecule bound to a ligand if the structure of the macromolecule without the ligand is already known). However, in many other projects this initial volume is not known, or constructing the initial volume from the data itself serves as a validation of any prior assumptions. The construction of this volume does not need to be performed in streaming, as an initial volume for this study can be calculated once a given number of particles has been reached (typically between 5000 and 10 000). Scipion offers several algorithms for this: EMAN, Xmipp Ransac (Vargas et al., 2014 ▸), Xmipp Significant (Sorzano et al., 2015 ▸), RELION (Scheres, 2016 ▸) and Simple Prime3D (Elmlund et al., 2013 ▸). They normally work on class averages, although some of them can also work on a set of particles. These algorithms produce one or several candidates for the initial volume, and the number of incorrect initial volumes depends on the particular specimen, although it may be non-negligible. Typically the user has to choose one of them as the initial volume to continue the study, and this choice can represent an important bias in the overall analysis. Recently, we introduced an algorithm called Xmipp Swarm consensus that can automatically calculate a consensus of initial volumes (Sorzano, Vargas et al., 2018 ▸) from a set of initial volume proposals and a set of particles. In this way, the selection is less biased. In the example that illustrates the streaming capacities of Scipion shown here, we used EMAN, Xmipp Ransac and Xmipp Significant, combining them into a single volume using Xmipp Swarm consensus (see Fig. 6 ▸).
Most of the resources and time-consuming protocols in the full streaming process reside in the first steps of the workflow, especially in the movie-alignment algorithms, followed by the CTF estimators and the pickers (Table 1 ▸). The rest of the protocols until the 2D classification can seamlessly follow the acquisition rate because the vast majority of the tasks involve dealing with the databases or downsampled images. Once in the 2D classification, the work involves batches of data, and thus the performance can be adjusted by fitting to the sizes of the batches.
Each EM facility and/or user may use a customized streaming image-processing workflow. On a more basic level, Scipion accepts two possible ways of creating and launching these workflows.(i) Use the Scipion API to create an empty project and then create the image-processing pipeline by adding new objects to the protocol, linking their inputs and outputs as required. This option requires Python programming skills, although it is very flexible as the user (programmer) has all the protocols to hand and is completely free to create any possible workflows.(ii) Creating a workflow from a workflow template. These can be created by exporting an existing workflow as a JSON file, which is then imported and scheduled for execution. There is a repository of publicly available workflows at http://workflows.scipion.i2pc.es that can be downloaded and used at will (see Fig. 7 ▸).
In this article, we have presented the possibilities that Scipion offers to design on-the-fly image-processing workflows. We illustrate these possibilities with a particular workflow, yet simpler or more complex workflows could have been designed and deployed at any EM facility. The customization of the workflow may depend on the computing capabilities available, the goal of the analysis (characterizing the microscopy session or the sample), the experience of the EM operator and user etc. Scipion supports a wide variety of algorithms for each task and although not all of them can be used in streaming, as shown here, this does not prevent them from participating in useful streaming workflows.