Research Article: Machine learning for predicting lifespan-extending chemical compounds

Date Published: July 18, 2017

Publisher: Impact Journals LLC

Author(s): Diogo G. Barardo, Danielle Newby, Daniel Thornton, Taravat Ghafourian, João Pedro de Magalhães, Alex A. Freitas.

http://doi.org/10.18632/aging.101264

Abstract

Increasing age is a risk factor for many diseases; therefore developing pharmacological interventions that slow down ageing and consequently postpone the onset of many age-related diseases is highly desirable. In this work we analyse data from the DrugAge database, which contains chemical compounds and their effect on the lifespan of model organisms. Predictive models were built using the machine learning method random forests to predict whether or not a chemical compound will increase Caenorhabditis elegans’ lifespan, using as features Gene Ontology (GO) terms annotated for proteins targeted by the compounds and chemical descriptors calculated from each compound’s chemical structure. The model with the best predictive accuracy used both biological and chemical features, achieving a prediction accuracy of 80%. The top 20 most important GO terms include those related to mitochondrial processes, to enzymatic and immunological processes, and terms related to metabolic and transport processes. We applied our best model to predict compounds which are more likely to increase C. elegans’ lifespan in the DGIdb database, where the effect of the compounds on an organism’s lifespan is unknown. The top hit compounds can be broadly divided into four groups: compounds affecting mitochondria, compounds for cancer treatment, anti-inflammatories, and compounds for gonadotropin-releasing hormone therapies.

Partial Text

Old age is the greatest risk factor for many diseases, including various types of cancer, inflammatory and neurodegenerative diseases. Traditional medical science combats one disease at a time, instead of combating the underlying biological ageing process that leads to many age-related diseases. From a whole body system’s point of view, this traditional one-disease-at-a-time approach focuses on the downstream diseases, rather than considering the underlying mechanisms of age-related functional decline. This approach has limited effectiveness at present and is likely to be less effective in the future, because of an increasingly larger elderly population suffering from multiple age-related diseases. In contrast, interventions that slow down ageing and promote “healthy ageing” could in principle delay the onset of all age-related diseases, with a significant benefit to human health and a large reduction of healthcare costs [1].

In this work we analysed data from the DrugAge database [10], which contains information about chemical compounds and their effect on the lifespan of organisms. We focused on compounds administered to C. elegans, since the majority of compounds in DrugAge have been evaluated in this model organism. For our data analysis, we used the machine learning method random forests, which builds a classification model to predict whether or not a chemical compound will increase the lifespan of C. elegans, based on predictive features describing that compound. We built three types of classification models, using either chemical descriptors or Gene Ontology terms, or both types of features. The dataset with both types of features led to the highest predictive accuracy in our experiments.

 

Source:

http://doi.org/10.18632/aging.101264

 

Leave a Reply

Your email address will not be published.