Date Published: May 01, 2020
Publisher: International Union of Crystallography
Author(s): Long Yang, Pavol Juhás, Maxwell W. Terban, Matthew G. Tucker, Simon J. L. Billinge.
Structure-mining finds and returns the best-fit structures from structural databases given a measured pair distribution function data set. Using databases and heuristics for automation, it has the potential to save experimenters a large amount of time as they explore candidate structures from the literature.
The development of science and technology is built on advanced materials, and new materials lie at the heart of technological solutions to major global problems such as sustainable energy (Moskowitz, 2009 ▸). However, the discovery of new materials still needs a lot of labor and time. The idea behind materials genomics (White, 2012 ▸) is to develop collaborations between materials scientists, computer scientists and applied mathematicians to accelerate the development of new materials through the use of advanced computation such as artificial intelligence (AI), for example, by predicting undiscovered materials with interesting properties (Jain et al., 2013 ▸; Simon et al., 2015 ▸; Curtarolo et al., 2013 ▸).
Structure-mining first obtains a large number of candidate structures from open structural databases. It then computes the PDFs of these structures and carries out structure refinements to obtain the best agreement between calculated PDFs and the measured PDF under study. The initial implementation uses two commonly utilized open structural databases: the Materials Project Database (MPD) (Jain et al., 2013 ▸) and the Crystallography Open Database (COD) (Gražulis et al., 2009 ▸). The structures are fetched directly from the databases using the RESTful API (Ong et al., 2013 ▸, 2015 ▸). There are many rules that could be used for selecting candidate structures to try. In this initial implementation of structure-mining, we are using the following heuristics for filtering which structure models to fetch: (1) all the structures that have the same stoichiometry as prescribed by the experimenter, (2) all the structures containing a prescribed list of elements, (3) all the structures containing the prescribed list of elements plus a number of additional elements specified by a wild-card symbol, (4) all the structures containing a subset of the prescribed elements plus other elements if a wild-card symbol is specified. These heuristics go from more restrictive to less restrictive and may be selected as desired. The results on representative data sets are presented below.
In this paper, we have demonstrated a new approach, called structure-mining, for automated screening of large numbers of candidate structures to the atomic pair distribution function (PDF) data, by automatically fetching candidate structures from structural databases and automatically performing PDF structure refinements to obtain the best agreement between calculated PDFs of the structures and the measured PDF under study. The approach has been successfully tested on the PDFs of a variety of challenging materials, including complex oxide nanoparticles and nanowires, low-symmetry structures, complicated doped, magnetic, locally distorted and mixed-phase materials. This approach could greatly speed up and extend the traditional structure searching workflow and enable the possibility of highly automated and high-throughput real-time PDF analysis experiments in the future.