The Daily Dose • Sunday, May 19, 2024

Using AI to Work with You, Not Instead of You

Young May Cha, MD

Artificial intelligence (AI) is upending major industries and health care is no exception. Anesthesiology as a field is inherently rich in technology, and anesthesiologists can potentially benefit from the aids provided by AI. In the session, “Generative AI and Medicine – Will This Change the Practice of Anesthesiology,” the panelists provided examples of how AI can be used to help us work more efficiently and explained the practical limitations of this technology. Dr. Michael Burns, MD, PhD, assistant professor at University of Michigan, moderated this session on Saturday, May 18 at the 2024 Annual Meeting, presented by IARS and SOCCA.

Christopher W. Connor, MD, PhD, associate professor of anaesthesia at Harvard Medical School, explained that AI has not made major fundamental theoretical advances in recent years. Rather, it is the development of new large-scale architectures that allow machine learning to be performed on extremely large datasets that has been transformative. He highlighted a recent review in Anesthesiology, which dives deeper into how these new machine learning architectures are being used in anesthesiology.

Dr. Connor explained three different models of generative AI. Auto-encoders and U-nets extract meaning and salience from images. For example, once it learns the representation of a “happy face” versus an “unhappy face,” it can then change an image from one to another. An application for anesthesiology is that this model could annotate images in real time during the performance of an ultrasound-guided regional nerve block. Stable diffusion and image creation is a type of generative AI that converts noise into artwork. This neural network can either add or remove noise from an image or, conversely, create an image by removing the noise. Transformer and large language models (LLMs) leverage massive computing power to perform sequential tasks. Unfortunately, this application has been challenging in practice because there is no prioritization of information. So, in a case of massive hemorrhage requiring transfusion, the model is unable to prioritize an individual parameter and act quickly like a clinician would.

Vesela P. Kovacheva, MD, PhD, assistant professor of anaesthesia at Harvard Medical School, then provided some examples of AI usage in healthcare. Natural language processing (NLP) is a field of AI that uses algorithms to understand, manipulate, and generate human language. LLMs of which ChatGPT is the most famous, are machine learning models that generate human-like text. When LLMs are used to aid in clinical documentation, about half of the generated discharge summaries were equivalent to those written by clinicians and over a third were thought to be superior to those written by clinicians. There is also the possibility for generative AI to assist with clinical research and quality improvement projects. Models such as FlanT5 were more accurately able to identify patients with postpartum hemorrhage than traditional methods, such as ICD codes.

Not only can LLMs be used to alleviate documentation burden, but they can also better transform medical text into more patient-friendly language. Chatbot responses have been rated to be more empathetic than physician responses. However, LLMs, like all generative AI models, have some safety concerns. Omissions of information have been noted and, perhaps more alarmingly, “hallucinations” can be produced when LLMs fabricate new information. These issues must be addressed before these models are used for patient care. So, even though GPT4 could pass the ABA basic exam and oral boards, it’s nowhere near ready to be an anesthesiologist.

Hannah Lonsdale, MBChB, assistant professor of pediatric anesthesiology at Vanderbilt University, rounded out the session with caution and concern about the use of AI. These AI models are predicting based on their own internal models. These internal models are largely trained on publicly available content, which is not necessarily appropriate or correct. Once released, the models are locked and their data may not be current at time of use. The models also tend to produce overgeneralized information and the hallucinations mentioned earlier. They are not of high enough quality for patient care or academic work. Instead, Dr. Lonsdale suggests these models are better suited as assistants to provide drafts for editing or to interpret medical articles for public use. 

Queries into models like ChatGPT can be broken down into three general types. First, is the “persona pattern,” in which you ask the model to act in a capacity, such as a critical reviewer or to assume the role of an anesthetist to review case history and score ASA status. Second, is “few-shot prompting” in which you provide the AI with some examples of a task, context, and some answers. Third, is “chain of thought-prompting” in which you provide the model with a template or a step-by-step process so it can arrive at a solution. These various prompt engineering techniques can be combined to generate an even more detailed prompt.

Importantly, Dr. Lonsdale cautions us never to enter patient data into these models. Any information put into an LLM is uploaded into the model and becomes a violation of patient privacy. For this reason, some institutions are starting to generate their own in-house LLMs to mitigate this privacy concern. Additionally, AI use is not permissible in all journals. Always refer to the author guidelines and take careful note of how AI usage must be disclosed. Even if LLM usage is allowed, it is essential to manually check everything the LLM writes. False citations are a notorious LLM hallucination.