Date Published: October 4, 2018
Publisher: Public Library of Science
Author(s): Benoit Playe, Chloé-Agathe Azencott, Véronique Stoven, Alexandre G. de Brevern.
Adverse drug reactions, also called side effects, range from mild to fatal clinical events and significantly affect the quality of care. Among other causes, side effects occur when drugs bind to proteins other than their intended target. As experimentally testing drug specificity against the entire proteome is out of reach, we investigate the application of chemogenomics approaches. We formulate the study of drug specificity as a problem of predicting interactions between drugs and proteins at the proteome scale. We build several benchmark datasets, and propose NN-MT, a multi-task Support Vector Machine (SVM) algorithm that is trained on a limited number of data points, in order to solve the computational issues or proteome-wide SVM for chemogenomics. We compare NN-MT to different state-of-the-art methods, and show that its prediction performances are similar or better, at an efficient calculation cost. Compared to its competitors, the proposed method is particularly efficient to predict (protein, ligand) interactions in the difficult double-orphan case, i.e. when no interactions are previously known for the protein nor for the ligand. The NN-MT algorithm appears to be a good default method providing state-of-the-art or better performances, in a wide range of prediction scenario that are considered in the present study: proteome-wide prediction, protein family prediction, test (protein, ligand) pairs dissimilar to pairs in the train set, and orphan cases.
As mentioned in Section 1.2, a few methods have been proposed to predict interactions between proteins and ligands. We compared the prediction performances of the proposed NN-MT method to those of two state-of-the art methods: a recent Matrix Factorization method called Neighborhood Regularized Logistic Matrix Factorization (NRLMF) , and the Kronecker (kernel) Regularized Least Square regression method KronRLS (a kernel-based method, as NN-MT) [18, 19].
The present study tackles prediction of ligand specificity on large scale in the space of proteins. More precisely, our goal was to propose a method to explore the specificity of molecules with state-of-the-art or better performance over a wide range of prediction situations: at the proteome or protein family scales, on average or in specific situations such as tested pairs far from the train set, or such as orphan proteins and ligands. In other words, the aim was to propose a robust default method, applicable to many types of studies, thus avoiding development of ad hoc complex and specific methods to non expert users. We chose to formulated it as a problem of predicting (protein, ligand) interactions within a multi-task framework based on SVM and Kronecker products of kernels on proteins and molecules. Within the kernel-based SVM methods tested in the Results section, we showed that the NN-MT method fulfills these requirements. In particular, NN-MT outperforms both the multi-task MT method and the corresponding single-task kernel-based methods, while it also keeps a computational cost close to that of single-task approaches. The NN-MT algorithm fulfills these requirements, leading to the best prediction performance for the three tested settings which cover most of the prediction situations that would be encountered in real-case studies.