Research Article: Better medicine through machine learning: What’s real, and what’s artificial?

Date Published: December 31, 2018

Publisher: Public Library of Science

Author(s): Suchi Saria, Atul Butte, Aziz Sheikh

Abstract: Machine Learning Special Issue Guest Editors Suchi Saria, Atul Butte, and Aziz Sheikh cut through the hyperbole with an accessible and accurate portrayal of the forefront of machine learning in clinical translation.

Partial Text: Of the myriad opportunities for use of ML in clinical practice, medical imaging workflows are most likely to be impacted in the near term. ML-driven algorithms that automatically process 2- or 3-dimensional image scans to identify clinical signs (e.g., tumors or lesions) or determine likely diagnoses have been published, and some are progressing through regulatory steps toward the market. Many of these use deep learning, a form of ML based on layered representations of variables, referred to as neural networks. To understand how deep learning methods leverage image data to perform recognition tasks, imagine you are entering a dark room and looking for the light switch. From past experience, you have learned to associate light switches with predictable locations within the configuration of a room. Many computer vision–based image processing algorithms, including deep learning, mimic this behavior to identify factors that are associated with the recognition task at hand. Deep learning is especially powerful in its ability to interpret images because of the complexity of the factors it can consider.

Prediction to aid preventative efforts is another promising frontier for improving outcomes using ML. For example, in the Special Issue, a study from Kristin Corey and colleagues considered the potential for reducing complications and mortality within 30 days following particular surgeries [7]. Using data from about 88,000 encounters extracted from June 2012 to June 2017, they developed software (Pythia) that incorporates a patient’s age, race, sex, medication, and comorbidity history to determine risk of complications or death post surgery. Overall, postsurgical complication rates were 16.0% for any complication within 30 days and 0.51% for death within 30 days. In a separate validation set of 12,000 encounters, at a threshold selected to have sensitivity of 0.75, Pythia achieves a positive predictive value of 0.35; in other words, 1 in 3 patients flagged by their approach have a postsurgical complication within 30 days. Comparison of Pythia’s scores to scores from The American College of Surgeons (ACS) National Surgical Quality Improvement Program (NSQIP) calculator on a smaller set of 75 encounters found that Pythia identifies higher-risk patients. A tool like Pythia can enable surgeons and referring clinicians to identify high-risk individuals who may require targeted assessments and optimization as part of their preoperative care. For example, a patient with anemia at high risk for a hematological complication such as bleeding may benefit from being put on iron transfused with blood prior to surgery or have medications managed to help mitigate the risk of losing blood during the procedure. The efficacy with which such algorithms can be operationalized to improve clinical adoption is a key question. Unlike in medical imaging applications, here the goal is to augment rather than automate existing workflows. Efforts testing such workflows in sepsis, a leading cause of death and one of the costliest complications, are underway at institutions such as Johns Hopkins and Duke, with the former system beginning to demonstrate benefit [8–10].

The definitions of diseases and disease subtypes we use today are based largely on the original symptom-based descriptions offered in the 17th century by Sydenham and Linnaeus and the organ-based definitions developed by Osler in the 20th century. It is, however, now possible to move beyond these observational approaches to more data-driven approaches to diagnosis and disease classification. In a series of experiments, Adnan Custovic and colleagues have been pursuing this approach in the context of asthma and allergy. Using unsupervised ML, the group analyzed data from the Manchester Asthma and Allergy Study (MAAS) population-based birth cohort and were able to identify novel phenotypes of childhood atopy [12]. Through further interrogation of this same dataset, the authors have now identified clusters of component-specific immunoglobulin E (IgE) sensitization using network and hierarchical cluster analysis that can help better predict risk of childhood asthma [13]. We believe there are considerable opportunities to employ similar data-driven approaches to aid diagnostic processes in other disease areas, and using ML methods to find new actionable disease subsets will be critical to advance precision medicine [14].

Medication errors are responsible for considerable—and potentially preventable—morbidity, mortality, and healthcare costs. These errors can be identified through a variety of means, including expert chart reviews, use of triggers, rules-based approaches to screening EMRs, and significant event audits. However, these approaches are associated with a number of challenges: suboptimal sensitivity and specificity, time consumption, and expense. ML-based anomaly detection techniques begin by developing a probabilistic model of what is likely to occur in a given context by using historical data. Using this model, a new event (e.g., medication given at a particular dose) within a specific context (e.g., individual patient characteristics) is flagged as anomalous if its probability of occurring within that context is very small. MedAware is a commercially available system that uses anomaly detection to generate medication error alerts. In a recent study, Gordon Schiff and colleagues used medical chart review to analyze the validity and clinical utility of these alerts [15] and found that three-quarters of the alerts generated by the screening system were valid according to the charts. Of these validated alerts, the majority (75.0%) were found to be clinically useful in flagging potential medication errors or issues. Such findings indicate that this approach has the potential to be incorporated into clinical use, although Schiff and colleagues do caution that the utility of this system is highly dependent on the quality and comprehensiveness of the underlying data.

We have discussed several examples of ML’s potential to transform medical care. However, naive implementation of ML without careful validation can also harm patients and the public. Consider, as an example, a hypothetical effort to predict the risk of emergency hospital admissions using a model trained on past admissions data for patients with various characteristics and symptoms. Actual admissions are often subject to bed availability, the type of insurance an individual is carrying, and reimbursement practices. Whereas this trained model might enable population-level resource planning, attempting to use it for individual-level triage may incorrectly classify an individual as not requiring an admission. To some extent, an ML algorithm can replicate past decisions, including biases around race and sex that may have influenced clinical judgement about the level of care given. “Irrational extrapolation”—the assumption that algorithms trained on an easy-to-obtain set of patients or data will lead to accurate models that act in each patient’s best interest—must be stringently avoided until algorithms can correct for such biases and use clinical data to reason about disease severity and trajectory.



Leave a Reply

Your email address will not be published.