Miao Brenda Y, Williams Christopher Y K, Chinedu-Eneh Ebenezer, Zack Travis, Alsentzer Emily, Butte Atul J, Chen Irene Y
Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA.
Department of Medicine, University of California San Francisco, San Francisco, CA, USA.
NPJ Digit Med. 2025 Apr 23;8(1):221. doi: 10.1038/s41746-025-01615-0.
Understanding reasons for treatment switching is of significant medical interest, but these factors are often only found in unstructured clinical notes and can be difficult to extract. We evaluated the zero-shot abilities of GPT-4 and eight other open-source large language models (LLMs) to extract contraceptive switching information from 1964 clinical notes derived from the UCSF Information Commons dataset. GPT-4 extracted the contraceptives started and stopped at each switch with microF1 scores of 0.85 and 0.88, respectively, compared to 0.81 and 0.88 for the best open-source model. When evaluated by clinical experts, GPT-4 extracted reasons for switching with an accuracy of 91.4% (2.2% hallucination rate). Transformer-based topic modeling identified patient preference, adverse events, and insurance coverage as key reasons. These findings demonstrate the value of LLMs in identifying complex treatment factors and provide insights into reasons for contraceptive switching in real-world settings.
了解治疗转换的原因具有重大医学意义,但这些因素往往仅存在于非结构化临床记录中,且难以提取。我们评估了GPT-4和其他八个开源大语言模型(LLM)从加州大学旧金山分校信息共享数据集中提取的1964份临床记录中提取避孕转换信息的零样本能力。GPT-4提取每次转换时开始和停止使用的避孕药具,微观F1分数分别为0.85和0.88,而最佳开源模型的分数为0.81和0.88。经临床专家评估,GPT-4提取转换原因的准确率为91.4%(幻觉率为2.2%)。基于Transformer的主题建模确定患者偏好、不良事件和保险覆盖范围是关键原因。这些发现证明了大语言模型在识别复杂治疗因素方面的价值,并为现实环境中避孕转换的原因提供了见解。