Research Article: Predicting Anatomical Therapeutic Chemical (ATC) Classification of Drugs by Integrating Chemical-Chemical Interactions and Similarities

Date Published: April 13, 2012

Publisher: Public Library of Science

Author(s): Lei Chen, Wei-Ming Zeng, Yu-Dong Cai, Kai-Yan Feng, Kuo-Chen Chou, Ozlem Keskin.


The Anatomical Therapeutic Chemical (ATC) classification system, recommended by the World Health Organization, categories drugs into different classes according to their therapeutic and chemical characteristics. For a set of query compounds, how can we identify which ATC-class (or classes) they belong to? It is an important and challenging problem because the information thus obtained would be quite useful for drug development and utilization. By hybridizing the informations of chemical-chemical interactions and chemical-chemical similarities, a novel method was developed for such purpose. It was observed by the jackknife test on a benchmark dataset of 3,883 drug compounds that the overall success rate achieved by the prediction method was about 73% in identifying the drugs among the following 14 main ATC-classes: (1) alimentary tract and metabolism; (2) blood and blood forming organs; (3) cardiovascular system; (4) dermatologicals; (5) genitourinary system and sex hormones; (6) systemic hormonal preparations, excluding sex hormones and insulins; (7) anti-infectives for systemic use; (8) antineoplastic and immunomodulating agents; (9) musculoskeletal system; (10) nervous system; (11) antiparasitic products, insecticides and repellents; (12) respiratory system; (13) sensory organs; (14) various. Such a success rate is substantially higher than 7% by the random guess. It has not escaped our notice that the current method can be straightforwardly extended to identify the drugs for their 2nd-level, 3rd-level, 4th-level, and 5th-level ATC-classifications once the statistically significant benchmark data are available for these lower levels.

Partial Text

Nowadays, the Anatomical Therapeutic Chemical (ATC) classification system, recommended by the World Health Organization (WHO), is the most widely recognized classification system for drugs. This classification system divides drugs into different groups according to the organ or system on which they act and/or their therapeutic and chemical characteristics. Accordingly, the ATC classification is very helpful for studying utilization of drugs and categorizing them according to different purposes, therapeutic properties, chemical and pharmacological properties (see Report of the WHO Expert Committee, 2005; World Health Organ Tech Rep, Ser:1–119). In the ATC classification system, drugs are classified into 14 main classes ( In order to understand this kind of complicated classification system, some efforts have been made [1], [2]. In a pioneer study, Gurulingappa et al. [2] proposed a method to study the ATC-classification system by combining the information extraction and machine learning techniques. However, their method can be used to identify the drug compounds only within the class of “Cardiovascular System”, one of the 14 main ATC classes.

Recently, the information of protein-protein interactions have been used for predicting various attributes of proteins (see, e.g., [11], [12], [13]), implying that interactive proteins are more likely to share common biological functions [11] than non-interactive ones [14]. Likewise, it is more likely that two interactive drug compounds may have the similar biological function. Actually, it is generally accepted that compounds with similar physicochemical properties often involve in similar biological activities [1]. Accordingly, it is reasonable to assume that the interactive drugs may likely belong to the same ATC-class, and so do those drugs with similar structures. Based on such rational, let us construct the following benchmark to develop a new method for identifying the ATC-classes of drugs.

For clarity, the original benchmark dataset of 3,883 drugs (cf. Supporting Information S1) can be separated into two subsets; i.e.,(15)where contains 2,144 drugs that had the chemical-chemical interaction information, while contains drugs that had no chemical-chemical interaction information. Listed in Table 2 are the results obtained by the aforementioned three different prediction methods in identifying the 14 main ATC classes for the drugs investigated. By examining the table, we can observe the following.