Bannett Yair, Gunturkun Fatma, Pillai Malvika, Herrmann Jessica E, Luo Ingrid, Huffman Lynne C, Feldman Heidi M
Division of Developmental-Behavioral Pediatrics, Stanford University School of Medicine, Stanford, California.
Stanford Quantitative Sciences Unit, Stanford, California.
Pediatrics. 2025 Jan 1;155(1). doi: 10.1542/peds.2024-067223.
To assess the accuracy of a large language model (LLM) in measuring clinician adherence to practice guidelines for monitoring side effects after prescribing medications for children with attention-deficit/hyperactivity disorder (ADHD).
Retrospective population-based cohort study of electronic health records. Cohort included children aged 6 to 11 years with ADHD diagnosis and 2 or more ADHD medication encounters (stimulants or nonstimulants prescribed) between 2015 and 2022 in a community-based primary health care network (n = 1201). To identify documentation of side effects inquiry, we trained, tested, and deployed an open-source LLM (LLaMA) on all clinical notes from ADHD-related encounters (ADHD diagnosis or ADHD medication prescription), including in-clinic/telehealth and telephone encounters (n = 15 628 notes). Model performance was assessed using holdout and deployment test sets, compared with manual medical record review.
The LLaMA model accurately classified notes that contained side effects inquiry (sensitivity = 87.2, specificity = 86.3, area under curve = 0.93 on holdout test set). Analyses revealed no model bias in relation to patient sex or insurance. Mean age (SD) at first prescription was 8.8 (1.6) years; characteristics were mostly similar across patients with and without documented side effects inquiry. Rates of documented side effects inquiry were lower for telephone encounters than for in-clinic/telehealth encounters (51.9% vs 73.0%, P < .001). Side effects inquiry was documented in 61.4% of encounters after stimulant prescriptions and 48.5% of encounters after nonstimulant prescriptions (P = .041).
Deploying an LLM on a variable set of clinical notes, including telephone notes, offered scalable measurement of quality of care and uncovered opportunities to improve psychopharmacological medication management in primary care.
评估大语言模型(LLM)在衡量临床医生对注意缺陷多动障碍(ADHD)患儿用药后副作用监测实践指南的遵循情况方面的准确性。
基于电子健康记录的回顾性人群队列研究。队列包括2015年至2022年期间在社区初级卫生保健网络中诊断为ADHD且有2次或更多次ADHD药物治疗经历(开具兴奋剂或非兴奋剂)的6至11岁儿童(n = 1201)。为了识别副作用询问的记录,我们在与ADHD相关的诊疗经历(ADHD诊断或ADHD药物处方)的所有临床记录上训练、测试并部署了一个开源大语言模型(LLaMA),包括门诊/远程医疗和电话诊疗经历(n = 15628条记录)。与人工病历审查相比,使用留出法和部署测试集评估模型性能。
LLaMA模型准确地对包含副作用询问的记录进行了分类(在留出测试集上,灵敏度 = 87.2,特异性 = 86.3,曲线下面积 = 0.93)。分析显示,该模型在患者性别或保险方面没有偏差。首次处方时的平均年龄(标准差)为8.8(1.6)岁;有和没有记录副作用询问的患者的特征大多相似。电话诊疗经历中记录副作用询问的比例低于门诊/远程医疗诊疗经历(51.9%对73.0%,P < 0.001)。兴奋剂处方后61.4%的诊疗经历和非兴奋剂处方后48.5%的诊疗经历记录了副作用询问(P = 0.041)。
在包括电话记录在内的一系列临床记录上部署大语言模型,为可扩展的医疗质量测量提供了可能,并发现了改善初级保健中精神药物管理的机会。