Guan Hao, Novoa-Laurentiev John, Zhou Li
Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA 02115, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA.
Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA 02115, USA.
J Biomed Inform. 2025 Jun;166:104830. doi: 10.1016/j.jbi.2025.104830. Epub 2025 May 2.
Early detection of cognitive decline during the preclinical stage of Alzheimer's disease and related dementias (AD/ADRD) is crucial for timely intervention and treatment. Clinical notes in the electronic health record contain valuable information that can aid in the early identification of cognitive decline. In this study, we utilize advanced large clinical language models, fine-tuned on clinical notes, to improve the early detection of cognitive decline.
We collected clinical notes from 2,166 patients spanning the 4 years preceding their initial mild cognitive impairment (MCI) diagnosis from the Enterprise Data Warehouse of Mass General Brigham. To train the model, we developed CD-Tron, built upon a large clinical language model that was finetuned using 4,949 expert-labeled note sections. For evaluation, the trained model was applied to 1,996 independent note sections to assess its performance on real-world unstructured clinical data. Additionally, we used explainable AI techniques, specifically SHAP values (SHapley Additive exPlanations), to interpret the model's predictions and provide insight into the most influential features. Error analysis was also facilitated to further analyze the model's prediction.
CD-Tron significantly outperforms baseline models, achieving notable improvements in precision, recall, and AUC metrics for detecting cognitive decline (CD). Tested on many real-world clinical notes, CD-Tron demonstrated high sensitivity with only one false negative, crucial for clinical applications prioritizing early and accurate CD detection. SHAP-based interpretability analysis highlighted key textual features contributing to model predictions, supporting transparency and clinician understanding.
CD-Tron offers a novel approach to early cognitive decline detection by applying large clinical language models to free-text EHR data. Pretrained on real-world clinical notes, it accurately identifies early cognitive decline and integrates SHAP for interpretability, enhancing transparency in predictions.
在阿尔茨海默病及相关痴呆症(AD/ADRD)的临床前期阶段早期发现认知衰退对于及时干预和治疗至关重要。电子健康记录中的临床记录包含有助于早期识别认知衰退的宝贵信息。在本研究中,我们利用在临床记录上进行微调的先进大型临床语言模型来改善认知衰退的早期检测。
我们从麻省总医院布莱根分院的企业数据仓库中收集了2166名患者在首次轻度认知障碍(MCI)诊断前4年的临床记录。为了训练模型,我们开发了CD-Tron,它基于一个大型临床语言模型构建,该模型使用4949个专家标注的记录片段进行了微调。为了评估,将训练好的模型应用于1996个独立的记录片段,以评估其在真实世界非结构化临床数据上的性能。此外,我们使用可解释人工智能技术,特别是SHAP值(SHapley加性解释)来解释模型的预测,并深入了解最具影响力的特征。还进行了错误分析以进一步分析模型的预测。
CD-Tron显著优于基线模型,在检测认知衰退(CD)的精确率、召回率和AUC指标方面取得了显著改进。在许多真实世界的临床记录上进行测试时,CD-Tron表现出高灵敏度,只有一个假阴性,这对于优先考虑早期和准确CD检测的临床应用至关重要。基于SHAP的可解释性分析突出了对模型预测有贡献的关键文本特征,支持透明度和临床医生的理解。
CD-Tron通过将大型临床语言模型应用于自由文本电子健康记录数据,为早期认知衰退检测提供了一种新方法。在真实世界的临床记录上进行预训练,它能准确识别早期认知衰退,并集成SHAP以实现可解释性,提高预测的透明度。