Date Published: November 6, 2018
Publisher: Public Library of Science
Author(s): Effy Vayena, Alessandro Blasimme, I. Glenn Cohen
Abstract: Effy Vayena and colleagues argue that machine learning in medicine must offer data protection, algorithmic transparency, and accountability to earn the trust of patients and clinicians.
Partial Text: MLm algorithms use data that are subject to privacy protections, requiring that developers pay close attention to ethical and regulatory restrictions at each stage of data processing. Data provenance and consent for use and reuse are of particular importance [4,5], especially for MLm that requires considerable amounts and large varieties of data. It is very likely that such disparate data will have different conditions of use and/or be bound by different legal protections. A prominent example is the newly enacted European General Data Protection Regulation (GDPR), which sets out specific informed consent requirements for data uses and grants data subjects several rights that must be respected by those processing their data . Moreover, this law applies to data from residents of the European Union (EU) irrespective of where the data are processed. Data that are used to train algorithms must have the necessary use authorizations, but determining which data uses are permitted for a given purpose is not an easy feat. This will also depend on data type, jurisdiction, purpose of use, and oversight models.
The computer science adage goes, “garbage in, garbage out.” This is especially true for MLm, since the data sets on which MLm models are trained and validated are essential in ensuring the ethical use of predictive algorithms. Poorly representative training data sets can introduce biases into MLm-trained algorithms. “Bias” is a fraught term, with at least two archetypes common in medical data. First are cases in which the data sources themselves do not reflect true epidemiology within a given demographic, as for instance in population data biased by the entrenched overdiagnosis of schizophrenia in African Americans . Second are cases in which an algorithm is trained on a data set that does not contain enough members of a given demographic—for instance, an algorithm trained mostly on data from older white men. Such an algorithm would make poor predictions, for example, among younger black women. If algorithms trained on data sets with these characteristics are adopted in healthcare, they have the potential to exacerbate health disparities .
Perhaps the MLm raising the most difficult ethical and legal questions—and the greatest challenge to current modes of regulation—is represented by noninterpretable, so-called black-box algorithms, the inner logic of which remains hidden even to their developers. This lack of transparency can preclude the mechanistic interpretation of MLm-based assessments and, in turn, reduce their trustworthiness. Moreover, the disclosure of basic yet meaningful details about medical treatment to patients—a fundamental tenet of medical ethics—requires that the doctors themselves grasp at least the fundamental inner workings of the devices they use. Therefore, for MLm to be ethical, developers must communicate to their end users—doctors—the general logic behind MLm-based decisions. Some degree of explainability may also be required to justify the clinical validation of MLm in prospective studies and randomized clinical trials. In the case of fully automated medical decisions, the level of risk associated with the procedure may determine whether and how information should be provided to patients about the presence of MLm-based technologies employed to guide their care. Communicating with patients about the use of MLm technologies may increase their trust and acceptance, which the survey data discussed above suggest is an ongoing concern.
The clinical use of MLm may transform existing modes of healthcare delivery. MLm will be used in the clinical setting by healthcare professionals, be embedded in smart devices through the internet of things, and be used by patients themselves beyond the clinical setting for disease self-management of chronic conditions. The exponential growth of investment in MLm signals that research is accelerating, and more products may soon be targeting market entry. To merit the trust of patients and adoption by providers, MLm must fully align with data protection requirements, minimize the effects of bias, be effectively regulated, and achieve transparency. Addressing such ethical and regulatory issues as soon as possible is essential for avoiding unnecessary risks and pitfalls that will hinder further progress of MLm.