Research Article: Measuring and using information gained by observing diffraction data

Date Published: March 01, 2020

Publisher: International Union of Crystallography

Author(s): Randy J. Read, Robert D. Oeffner, Airlie J. McCoy.


The information content gained by making a diffraction-intensity measurement is a natural criterion for deciding which data make a useful contribution and which can legitimately be omitted from a calculation.

Partial Text

Likelihood-based methods are now used throughout crystallography to provide a probabilistic treatment of the effects of all sources of error in tasks such as phasing with a model (Read, 1986a ▸), experimental phasing (de La Fortelle & Bricogne, 1997 ▸; McCoy et al., 2004 ▸), model refinement (Pannu & Read, 1996 ▸; Bricogne & Irwin, 1996 ▸; Murshudov et al., 2011 ▸) and molecular replacement (McCoy et al., 2007 ▸; Read & McCoy, 2016 ▸). In all of these areas, the introduction of likelihood has led to more powerful and robust methods.

It might be useful to provide a very rough correspondence between the mean information gain in the highest resolution shell and the mean I/σ ratio. There is not, of course, a one-to-one relationship between these quantities. As seen in Fig. 1 ▸, in which observations with the same I/σ ratio will lie on a line running through the origin and observations with the same standard deviation will lie on a horizontal line, the information gain depends on both the intensity and its standard deviation. Nonetheless, we can obtain an intuitive idea of how these quantities are related by considering some drastic simplifying assumptions.

As demonstrated here, including weak data in crystallographic calculations adds signal and can even make the difference between success and failure. With proper accounting for the effects of measurement errors, such as in the LLGI target used for molecular replacement in Phaser, even data with negligible signal can now be accommodated without the danger of adding noise. This allows structures to be determined more readily, even if they suffer from effects such as strong diffraction anisotropy or tNCS. The potential disadvantage of increasing computational cost without any added benefit can be avoided by using the close relationship between likelihood and information gain to identify the observations that can legitimately be ignored. However, when optimal treatments for measurement error are not used more care must be taken about which data to include.