Date Published: April 01, 2016
Publisher: International Union of Crystallography
Author(s): Colin R. Groom, Ian J. Bruno, Matthew P. Lightfoot, Suzanna C. Ward.
This paper is the definitive article describing the creation, maintenance, information content and availability of the Cambridge Structural Database (CSD), the world’s repository of small molecule crystal structures.
The ongoing stewardship of the Cambridge Structural Database (CSD) has been the core activity of the Cambridge Crystallographic Data Centre (CCDC) since its inception in 1965. The CCDC is committed to providing a permanent archive of crystal structures and making these available to all. This non-profit, charitable organization is overseen by an international board of trustees drawn from the community it serves.
In 2015 the number of entries in the CSD surpassed 800 000 (Fig. 1 ▸). This is twice the number of entries in the database less than a decade ago. Comparing statistics based on the database as it was then allows us to see what has changed in the last decade – and what has not. Table 1 ▸ shows that the proportion of structures which are organic or metal–organic structures (which we classify as structures containing a transition metal, lanthanide, actinide, or Al, Ga, In, Tl, Ge, Sn, Pb, Sb, Bi, Po) has remained fairly constant. What has changed is the complexity of the structures being published: the average number of atoms per structure and the average molecular weight have increased (Fig. 2 ▸), as has the proportion of structures that are polymeric or that have resolved disorder (Fig. 3 ▸).
What today we call the Cambridge Structural Database began life as ‘a computer-based file containing both bibliographic information and numerical data abstracted from the literature and relevant to molecular crystal structures, as obtained by diffraction methods’ (Kennard & Watson, 1970b ▸). Work compiling this file began in 1965 and contents were made available through the series of printed volumes, ‘Molecular Structures and Dimensions’ (Kennard & Watson, 1970a ▸). Over time the file developed into a more structured database and the software used to generate and check data evolved into interactive applications enabling chemical searching and analysis of three-dimensional structural data. Together these provided the foundations for the rich suite of CSD-based applications available today.
The method for deposition into the CSD has evolved since the advent of the CIF format in the 1990s, when email depositions dominated. In 2009 the CCDC launched an online web-based tool which is now the main route for deposition. In 2015, 90% of structures were deposited with the CCDC prior to publication and 85% were submitted through this service. A key benefit of this early deposition is that at this point the crystallographer who generated the data is likely to be the depositor and be in a position both to provide the richest data and to respond to any issues most effectively.
Since its very inception, X-ray and neutron crystallography has been the method of choice for the elucidation of the full three-dimensional structure of molecules (Wilkins, 2013 ▸). However, techniques involving electron diffraction (see for example Yun et al., 2015 ▸), atomic force microscopy (see for example Gross et al., 2009 ▸), free electron lasers (Barty et al., 2013 ▸) and NMR crystallography (Baias et al., 2013 ▸) have already shown their potential. In some cases, this will involve the capture of molecular structures not in a crystalline lattice, so systems have been designed to allow for this. Developments are also already underway to allow effective treatment of predicted crystal structures (Reilly et al., 2016 ▸).