Research Article: Methods for merging data sets in electron cryo-microscopy

Date Published: September 01, 2019

Publisher: International Union of Crystallography

Author(s): Max E. Wilkinson, Ananthanarayanan Kumar, Ana Casañal.


A workflow to combine cryo-EM data collected at different magnification.

Partial Text

Over the last decade, electron cryo-microscopy (cryo-EM) has become a powerful tool to resolve three-dimensional (3D) structures of biological specimens at a resolution sufficient for proposing de novo atomic models (Kühlbrandt, 2014 ▸; Smith & Rubinstein, 2014 ▸; Cheng et al., 2017 ▸). This has primarily been possible through progress made in the development of direct electron detectors (Battaglia et al., 2009 ▸; Faruqi & McMullan, 2011 ▸; Li et al., 2013 ▸; McMullan et al., 2014 ▸) and improvements in image-processing algorithms (Scheres & Carazo, 2009 ▸; Scheres, 2012 ▸). The result of these advances is a rapid growth in the number of cryo-EM structures deposited per year in the Electron Microscopy Database (EMDB;,

For several reasons, it may not be possible to precisely calculate the correct scaling factor. For example, the resolution of the individual maps may not be high enough for a discrete peak of correlation to emerge when aligning maps in Chimera, or a convenient ratio of box sizes that gives the correct scaling factor may not be available. We tested how accurately the scaling factor needs to be determined (Fig. 6 ▸). We scaled data set II in the spliceosome example using different starting pixel sizes: the correct 0.880 Å per pixel, the almost correct 0.884 Å per pixel, the incorrect 0.940 Å per pixel and two close sizes of 0.860 and 0.900 Å per pixel [Fig. 6 ▸(a)]. For each case we recalculated the CTF parameters using Gctf (Zhang, 2016 ▸) and extracted with a 560-pixel box, scaled to various box sizes (e.g. 440 pixels for 0.880 Å per pixel, 470 pixels for 0.940 Å per pixel) and cropped to 420 pixels. After refinement, we determined the resolution using the same mask [Fig. 6 ▸(b)]. We found that using 0.884 Å per pixel gave a reconstruction that was almost identical to the correct 0.880 Å per pixel. Either 0.860 or 0.900 Å per pixel gave only a small reduction in resolution, while 0.940 Å per pixel gave a reconstruction with worse resolution than either of the starting data sets alone [Fig. 6 ▸(b)]. This analysis shows that the resolution obtained after reconstruction from merged data sets is relatively tolerant to scaling-factor error, at least in the 3.70 Å resolution range for a 1–2 MDa particle: 0.5–2% error is acceptable, while 7% error is not. In principle, higher accuracy should be required when analysing a larger complex or a higher resolution structure (although both of these factors should also facilitate more accurate scaling-factor determination).

In cryo-EM, protein samples often require extensive biochemical optimization, including sample preparation and vitrification. To obtain structural information that helps to gain insight into specific biological problems, researchers often acquire large data sets or collect data from different preparations. Merging data sets recorded at different microscopes, or with varying conditions of imaging, therefore becomes an important task. In this report, we describe two methods (scaling micrographs or particles) to combine cryo-EM data collected at different pixel sizes successfully. We have also shown how errors in pixel-size determination correlate with resolution and provide scripts for the accurate determination of pixel sizes. In our examples, data sets from the same type of microscope (Titan Krios) and detector (K2 equipped with a GIF energy filter) have been merged, improving the resolution. This methodology can be further extended to other cases in which cryo-EM data sets have been acquired with different types of microscopes and detectors. When combining data collected using different detectors, each detector will have a specific MTF file. Different MTF files are likely to have a minimal effect in the final 3D reconstruction, but further analysis is necessary to report their impact at near-Nyquist resolution. It should be noted that adding more data to existing data sets is not always a means to improve the quality of 3D reconstructions. For example, when data quality limits resolution, additional particles will have a marginal effect. To decide whether further data collection is useful, a good strategy is to merge the existing data sets one by one (starting with the best). This will determine whether combining more particles of similar quality improves the quality of the 3D reconstruction. To estimate the number of particles (of similar quality) required to increase resolution one can use Rosenthal and Henderson plots (Rosenthal & Henderson, 2003 ▸; Zivanov et al., 2018 ▸). In our examples, the addition of about 50% more particles helped to increase the resolution of the 3D reconstructions from 3.61 to 3.50 Å for the polymerase module of CPF, and the addition of 100% more particles increased the resolution from 3.89 to 3.70 Å for the post-catalytic (P complex) spliceosome. Importantly, improving the quality of the data will also improve the resolution of the resulting 3D reconstruction (Naydenova & Russo, 2017 ▸). For example, the correction of beam tilt using the new tools in RELION 3.0 (Zivanov et al., 2018 ▸) improves the resolution of the polymerase module from data set I to a greater extent than merging with data set II.

The following references are cited in the supporting information for this article: Scheres (2014 ▸), Zhang (2016 ▸) and Zheng et al. (2017 ▸).




Leave a Reply

Your email address will not be published.