Large language models (LLMs) are complex computer algorithms that use neurology and brain science principles to create an artificial neural network that can recognize and generate human-like language. The release of a chatbot based on an LLM in 2022 raised questions about the opportunities and risks of LLM applications in healthcare.
Zachary Grinspan, MD, MS, vice chair of health data science in the Department of Pediatrics and director of the Pediatric Epilepsy Program at NewYork-Presbyterian and Weill Cornell Medicine, serves on the American Academy of Neurology’s Quality Informatics Subcommittee, which is particularly interested in understanding how electronic health records and new digital technologies can help with quality measurement and improvement. Members of this group, including Dr. Grinspan, recently published an article on the implications of LLMs for the quality and efficiency of neurologic care in Neurology.
Below, Dr. Grinspan discusses this article and shares how these models are being integrated into neurologic care at NewYork-Presbyterian and Weill Cornell Medicine.
The Benefits and Obstacles of Utilizing LLMs Effectively
Quality improvement (QI) in medicine is about identifying opportunities to optimize care delivery so our patients have better outcomes – and then acting on those opportunities. The labor of QI often includes reviewing narrative charts to extract key information for measurement. LLMs have the potential to help support that work. My colleagues and I were interested to see how this might be done for neurologic care.
The technology underlying LLMs is powerful with the potential to impact the practice of medicine in multiple domains. And so another goal of our article was to raise awareness of technology, and to understand both its risks as well as its benefits. One example is something called hallucinations, where the LLM makes something up out of thin air. We also worry about bias. The LLMs are trained on what people write, and we know that our writings are biased. In older medical writing, for example, there may be racial and ethnic biases in how the data are presented or analyzed – a human reader may recognize and call out those biases while the LLMs may incorporate those biases into its responses. Thus, to use LLMs effectively, we must be aware of, guard against, and measure these biases.
One of the significant obstacles to using LLMs is that the privacy rules around health records are much stricter than for other kinds of data. To use a commercial LLM, we need to have business associate agreements in place -- and those have been surprisingly challenging to negotiate. At Weill Cornell Medicine, we have a longstanding relationship with Google and the Google Cloud Platform, and we’ve been able to begin piloting use of their LLM called Gemini. Further, our research informatics group led by Thomas Campion and John Ruffing has been very forward-thinking in ensuring we could leverage this preexisting relationship to allow researchers to learn how to use these LLMs.
We found that the LLM identified many of these symptoms core symptoms with accuracy often of 90% or higher. This suggests the LLM may be a valuable tool to help screen charts for these symptoms.
— Dr. Zachary Grinspan
Among those who seized this opportunity is Katherine Cunnane, a Weill Cornell Medicine medical student, who took the initiative to learn and apply these advanced LLMs in her research. She previously looked at the kinds of symptoms most impactful to the lives of children with developmental and epileptic encephalopathy (DEE). We decided to test the idea that a core set of symptoms is consistent across all DEEs, and to use LLMs to increase the numbers of charts we could review.. We found that the LLM identified many of these symptoms core symptoms with accuracy often of 90% or higher. This suggests the LLM may be a valuable tool to help screen charts for these symptoms. Katharine then looked at what the LLM was telling us about the population of children, and the distribution of these symptoms was almost identical to the distribution of the ones that she had chart-reviewed herself. This further shows the potential power of the use of LLMs in medicine.
Although this can be a huge help, clinicians must remember that these technologies do not replace our medical background.
— Dr. Zachary Grinspan
Many people in the medical community are excited about the potential ability of these LLMs to help with tasks such as summarizing a patient's history, taking an interview, and writing clinical notes. The technology that will most likely enter our clinical practice first is the virtual scribe. There are already commercial products that can record a conversation and produce a first draft of your clinical notes. Although this can be a huge help, clinicians must remember that these technologies do not replace our medical background. Several healthcare centers have taken early steps to integrate LLMs into neurologic care. However, these initial applications are primarily exploratory and necessitate further in-depth research to understand the true potential of LLMs.