Cheng You, Malekar Mrunal, He Yingnan, Bommareddy Apoorva, Magdamo Colin, Singh Arjun, Westover Brandon, Mukerji Shibani S, Dickson John, Das Sudeshna
Department of Neurology, Massachusetts General Hospital, Cambridge, MA, United States.
Harvard Medical School, Boston, MA, United States.
JMIR AI. 2025 Jun 3;4:e66926. doi: 10.2196/66926.
Alzheimer disease and related dementias (ADRD) are complex disorders with overlapping symptoms and pathologies. Comprehensive records of symptoms in electronic health records (EHRs) are critical for not only reaching an accurate diagnosis but also supporting ongoing research studies and clinical trials. However, these symptoms are frequently obscured within unstructured clinical notes in EHRs, making manual extraction both time-consuming and labor-intensive.
We aimed to automate symptom extraction from the clinical notes of patients with ADRD using fine-tuned large language models (LLMs), compare its performance to regular expression-based symptom recognition, and validate the results using brain magnetic resonance imaging (MRI) data.
We fine-tuned LLMs to extract ADRD symptoms across the following 7 domains: memory, executive function, motor, language, visuospatial, neuropsychiatric, and sleep. We assessed the algorithm's performance by calculating the area under the receiver operating characteristic curve (AUROC) for each domain. The extracted symptoms were then validated in two analyses: (1) predicting ADRD diagnosis using the counts of extracted symptoms and (2) examining the association between ADRD symptoms and MRI-derived brain volumes.
Symptom extraction across the 7 domains achieved high accuracy with AUROCs ranging from 0.97 to 0.99. Using the counts of extracted symptoms to predict ADRD diagnosis yielded an AUROC of 0.83 (95% CI 0.77-0.89). Symptom associations with brain volumes revealed that a smaller hippocampal volume was linked to memory impairments (odds ratio 0.62, 95% CI 0.46-0.84; P=.006), and reduced pallidum size was associated with motor impairments (odds ratio 0.73, 95% CI 0.58-0.90; P=.04).
These results highlight the accuracy and reliability of our high-throughput ADRD phenotyping algorithm. By enabling automated symptom extraction, our approach has the potential to assist with differential diagnosis, as well as facilitate clinical trials and research studies of dementia.
阿尔茨海默病及相关痴呆症(ADRD)是具有重叠症状和病理特征的复杂疾病。电子健康记录(EHR)中的症状综合记录不仅对于准确诊断至关重要,而且对于支持正在进行的研究和临床试验也至关重要。然而,这些症状在EHR的非结构化临床记录中经常被掩盖,使得手动提取既耗时又费力。
我们旨在使用微调后的大语言模型(LLM)从ADRD患者的临床记录中自动提取症状,将其性能与基于正则表达式的症状识别进行比较,并使用脑磁共振成像(MRI)数据验证结果。
我们对LLM进行微调,以提取以下7个领域的ADRD症状:记忆、执行功能、运动、语言、视觉空间、神经精神和睡眠。我们通过计算每个领域的受试者工作特征曲线下面积(AUROC)来评估算法的性能。然后在两项分析中验证提取的症状:(1)使用提取症状的计数预测ADRD诊断;(2)检查ADRD症状与MRI衍生脑体积之间的关联。
7个领域的症状提取取得了高精度,AUROC范围为0.97至0.99。使用提取症状的计数预测ADRD诊断的AUROC为0.83(95%CI 0.77-0.89)。症状与脑体积的关联显示,较小的海马体积与记忆障碍有关(优势比0.62,95%CI 0.46-0.84;P=0.006),苍白球体积减小与运动障碍有关(优势比0.73,95%CI 0.58-0.90;P=0.04)。
这些结果突出了我们的高通量ADRD表型算法的准确性和可靠性。通过实现自动症状提取,我们的方法有可能协助进行鉴别诊断,并促进痴呆症的临床试验和研究。