Date Published: April 10, 2017
Publisher: Public Library of Science
Author(s): Alejandro Pironti, Nico Pfeifer, Hauke Walter, Björn-Erik O. Jensen, Maurizio Zazzi, Perpétua Gomes, Rolf Kaiser, Thomas Lengauer, Yoshihiro Yamanishi.
Antiretroviral treatment history and past HIV-1 genotypes have been shown to be useful predictors for the success of antiretroviral therapy. However, this information may be unavailable or inaccurate, particularly for patients with multiple treatment lines often attending different clinics. We trained statistical models for predicting drug exposure from current HIV-1 genotype. These models were trained on 63,742 HIV-1 nucleotide sequences derived from patients with known therapeutic history, and on 6,836 genotype-phenotype pairs (GPPs). The mean performance regarding prediction of drug exposure on two test sets was 0.78 and 0.76 (ROC-AUC), respectively. The mean correlation to phenotypic resistance in GPPs was 0.51 (PhenoSense) and 0.46 (Antivirogram). Performance on prediction of therapy-success on two test sets based on genetic susceptibility scores was 0.71 and 0.63 (ROC-AUC), respectively. Compared to geno2pheno[resistance], our novel models display a similar or superior performance. Our models are freely available on the internet via http://www.geno2pheno.org. They can be used for inferring which drug compounds have previously been used by an HIV-1-infected patient, for predicting drug resistance, and for selecting an optimal antiretroviral therapy. Our data-driven models can be periodically retrained without expert intervention as clinical HIV-1 databases are updated and therefore reduce our dependency on hard-to-obtain GPPs.
Prolonged chemotherapy against the human immunodeficiency virus type 1 (HIV-1) bears the risk of selection of resistant viral strains, ultimately leading to therapy failure [1–6]. Once a drug-resistant HIV-1 variant has been selected in a host, it can be transmitted to another host [6,7]. Furthermore, drug-resistant viral variants are permanently archived in the body of the host and can promptly reemerge if drug pressure conveys them a competitive advantage to other viral variants . In order to prevent premature therapy failure, the susceptibility of an HIV-1 variant to available antiretroviral drugs can be measured phenotypically or genotypically [4,9–12]. Due to the high cost, limited accessibility and high turnaround time of phenotypic resistance assays, genotypic resistance determination has become the standard of care [4,9]. Phenotypic resistance assays afford direct, quantitative resistance assessments that take into account resensitizing mutations , as well as complex mutational patterns . However, certain drugs show significantly decreased in-vivo efficacy at very low in-vitro susceptibility changes which are close to the inherent variability of the phenotypic assay . Furthermore, viral strains with mutations that do not directly cause resistance, but are strongly associated with the emergence of drug resistance, may be deemed susceptible by in-vitro phenotypic drug-resistance assays. If the respective drugs are taken by patients harboring these strains, resistant variants will promptly emerge and compromise virologic response to therapy .
We trained models for predicting whether an HIV-1 variant had been previously exposed to a certain drug. One or two models were trained for each of the drugs considered in this study (Methods). Specifically, Exposure models were trained with HIV-1 sequences and information on drug exposure. The development sets of ExposurePheno models included genotype-phenotype pairs (GPPs) in addition to the data included in Exposure models. Since a sufficient number of HIV-1 sequences with information on drug-exposure was not available for all drugs, Exposure models could not be trained for all drugs. Additionally, we trained a model for discriminating between HIV-1 sequences from treatment-naïve patients and HIV-1 sequences from treatment-experienced patients. In the following, we refer to a number of datasets that we used for training and validating our models. For the comfort of the reader, we summarize the contents of each of these datasets in Table 1. Furthermore, we depict the relationships of each of the datasets in Figure A in S1 File.
DES models constitute data-driven interpretation systems for HIV-1 protease, reverse-transcriptase, and integrase sequences. Two versions of DES models were trained and tested. Specifically, one version of the models is solely trained on genotypes and drug exposure information (Exposure models), while the other version additionally includes GPPs (ExposurePheno models). When compared to ExposurePheno models, Exposure models show a high performance when predicting drug exposure, but their correlation with RFs and their performance when predicting antiretroviral therapy success are lower. We chose to include GPPs in the training sets of ExposurePheno models for the following reasons. Both drug exposure and drug resistance are predictive of success of antiretroviral therapy [34,38–40]. The major factor leading to viral drug resistance is exposure to antiretroviral drugs. Specifically, drug resistance arises through the selection of HIV-1 strains with mutations that confer a replicative advantage in the presence of the drug. Thus, drug exposure indirectly causes drug resistance and therefore, both drug exposure and drug resistance are correlated with certain mutations in the genome of HIV-1. Nevertheless, drug exposure and drug resistance are not redundant, but can complement each other. For this reason, simultaneous interpretation of HIV-1 genotypes with respect to drug exposure and to drug resistance is useful for the prediction of the success of antiretroviral therapy. ExposurePheno models consider drug exposure and drug resistance jointly. For the purpose of including GPPs in the training set of classification models, RFs required categorization. Thus, we replaced the RFs in the GPPs with the labels susceptible and resistant. For the purpose of labeling, the RF cutoffs one and ten were applied to all GPPs, regardless of the drug-resistance test (Antivirogram or PhenoSense) and of the tested drug. GPPs with RFs between one and ten were not used for training the models. When clinically relevant categorization of GPPs is intended, different cutoffs for each drug and drug resistance test must be used . However, rather than producing clinically relevant labels for training, we aimed at discriminating fully susceptible GPPs from those that have developed resistance to an extent well beyond the variability arising from the drug resistance test itself, for the following reasons. First, drug resistance is a continuum, and the creation of training instances with a clear separation in this continuum is adequate for the training of binary classification models. Second, clinically relevant cutoffs are selected under the (implicit) consideration of the pharmacokinetic properties of a drug. For example, the use of ritonavir as a booster for protease inhibitors (PIs) leads to an increased and sustained concentration of PIs in the body . For this reason, clinically relevant cutoffs for boosted PIs are shifted upwards with respect to their unboosted counterparts . However, we aim at discriminating viral sequences that display mutations as a consequence of drug exposure (or as the cause of resistance), without regard for drug concentrations in the blood of patient. The cutoffs one and ten are adequate for combining GPPs produced with the Antivirogram and PhenoSense assays; if other assays are used, other cutoffs might need to be selected. One advantage of ExposurePheno models over Exposure models is their higher performance. Another advantage is that they can make use of an additional data source, the GPPs. The use of GPPs allowed for the training of models for two additional drugs (EVG and RPV).