Date Published: June 11, 2017
Author(s): Thomas Luechtefeld, Alexandra Maertens, Daniel P. Russo, Costanza Rovida, Hao Zhu, Thomas Hartung.
The public data on skin sensitization from REACH registrations already included 19,111 studies on skin sensitization in December 2014, making it the largest repository of such data so far (1,470 substances with mouse LLNA, 2,787 with GPMT, 762 with both in vivo and in vitro and 139 with only in vitro data). 21% were classified as sensitizers. The extracted skin sensitization data was analyzed to identify relationships in skin sensitization guidelines, visualize structural relationships of sensitizers, and build models to predict sensitization.
While computational toxicology has recently seen the collection of several large-scale datasets (e.g., US EPA’s ToxCast, the Tox21 alliance of US agencies), the data collected by REACH (Regulation (EC) 1907/2006), owing to its legislative nature as a central repository for testing data, is the largest collection of toxicology data today relating to in vitro and in vivo studies. However, REACH dossiers submitted to the European Chemicals Agency (ECHA) are currently not in a machine readable format and any workflows involving the public summary data in REACH depend on a slow and error-prone process of manual extraction.
Multiple programming languages, packages and database tools were used in the development of this project, including SCALA, Java, Python, MongoDB, SQL, HTML Unit and Gephi. For details see Luechtefeld et al. (2016, this issue).
This analysis of skin sensitization data for industrial chemicals registered in REACH shows the wealth of information that can be used to tailor and optimize future testing strategies. Our preliminary analysis shows how such a large dataset can be leveraged. This can considerably reduce testing needs and related costs (Hartung and Rovida, 2009; Rovida and Hartung, 2009). Computational approaches benefit enormously from the amount and quality of the data. Already relatively simple algorithms can make reasonable predictions, also supporting the concept of read-across (Patlewicz et al., 2014). Noteworthy, the majority of new tools and testing strategies were developed over the last decade on the same set of only about 145 substances (Natsch et al., 2013). In a data-rich environment, conclusions can be drawn reasonably from neighboring substances, especially when structural properties are backed with biological profiling (Zhu et al., 2016, this issue).
The characterization of the REACH chemical universe (at the time of extraction, 2014) in the context of skin sensitization showed a proportion of 21% sensitizers among these predominantly high-production volume chemicals. The large number of repeat studies and the overlap of methods used for assessment of many chemicals allowed the investigation of the reproducibility of the in vivo methods. This showed a range of 80–90% and therefore no alternative method or integrated testing strategy (ITS) should be expected to perform better than this as long as we use these tests as points of reference.