Will Machine Learning Replace Clinical Acumen?
50,000,000,000,000,000 − 50 petabytes − is the volume of health data generated by a healthcare system in a single year. We are currently only scratching the surface of the clinical insight that lies within these massive data sets, according to Review Course Lecturer Hannah Lonsdale, MBChB, who presented “Machine Learning for Anesthesiologists 101” at the IARS 2021 Annual Meeting. Dr. Lonsdale, a researcher and future pediatric anesthesia fellow at Johns Hopkins University, offered a deep dive into the mind of artificial intelligence.
Machine learning uses algorithms and mathematical functions to develop models that perform human-like functions such as problem-solving, object and word recognition, and decision-making. Dr. Lonsdale used the analogy of a medical student studying for the USMLE to explain how machines are trained to learn. The student studies a bank of test questions where the rationale can also be applied to future questions. In the same way, the machine model absorbs an enormous volume of training data, then develops statistical rules to use on future data for predicting clinical outcomes.
After the model has “learned” the data, data scientists perform external validation. Electronic medical records (EMR) data is pipelined out for machine learning and clinical outcomes are predicted and then verified with what occurs in actual patients. Eventually outcomes are shared with providers to offer clinical insight and enhance decision-making, but never to replace clinician judgment.
One of the most successful examples of machine learning in medicine to date is Machine Vision, a project for detection of pathology from retinal images using massive data sets from ophthalmology, radiology, and pathology for clinical decision-making.
Dr. Lonsdale posed the question, “Why use a machine when a trained, human biostatistician could analyze the data?” The truth is the volume, variety, and velocity of big data sets can be difficult for humans to process. Complex, nonlinear relationships can be explored with artificial intelligence that would not be possible with biostatistical manpower. Hypotheses can be created, but causality cannot be inferred.
What exactly happens inside the mysterious black box of machine learning? Dr. Lonsdale probed deeper into the statistical enigma of big data and is almost convinced that these methods are as straightforward as our own motor vehicles. Machine-learning models are evaluated with a statistical measure called area under the receiver operating characteristic curve (AUROC). The AUROC graph depicts 1-specificity (true negatives) on the x-axis against sensitivity (true positives) on the y-axis where the best performing model would have an AUROC close to 1. A value of .5 indicates a model performing no better than random chance.
Dr. Lonsdale recommended the article by Hofer et. al., “The Development and validation of a deep neural network model to predict postoperative mortality, acute kidney injury, and reintubation using a single feature set,” for a better understanding of AUROC. Because outcomes examined in anesthesia generally have a low prevalence (i.e., postoperative mortality, acute kidney injury, etc.), AUROC is useful for assessing a population but not individual patients. The reason is that AUROC functions independently of prevalence.
The area under the precision recall curve (AUPRC) is an alternative statistical measure to look at how prevalence affects model performance. AUPRC looks at recall (true positives) on the x-axis against precision (positive predictive value) on the y-axis. Positive predictive value (PPV) reflects prevalence. Arvind et. al., in their article, “Development of machine learning algorithm to predict intubation among hospitalized patients with COVID-19,” demonstrates this method.
The future of machine learning includes tailored recommendations for patients. A model using pediatric algorithms to determine intraoperative blood transfusion requirements is on the horizon. In 5 years, a digital twin will be a reality, using machine learning that utilizes the patient genome, EMR data, diagnostic data, and lifestyle factors. This twin model will allow clinicians to determine best treatment options for our live patient.
Incorporating machine learning into clinical practice is in the early stages. We confidently trust in our motor vehicles, not because we understand the complexities of auto mechanics, but because we trust in the safety regulations. When clinicians become more familiar with the methodology, FDA regulatory processes, and important legal and ethical considerations surrounding machine learning, they will be able to take advance of the massive data stores awaiting exploration.
International Anesthesia Research Society