Ge Wendong, Godeiro Coelho Lilian M, Donahue Maria A, Rice Hunter J, Blacker Deborah, Hsu John, Newhouse Joseph P, Hernández-Díaz Sonia, Haneuse Sebastien, Westover Brandon, Moura Lidia M V R
Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, United States.
Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, United States.
Am J Epidemiol. 2025 Apr 8;194(4):1097-1105. doi: 10.1093/aje/kwae240.
Fall-related injuries (FRIs) are a major cause of hospitalizations among older patients, but identifying them in unstructured clinical notes poses challenges for large-scale research. In this study, we developed and evaluated natural language processing (NLP) models to address this issue. We utilized all available clinical notes from the Mass General Brigham health-care system for 2100 older adults, identifying 154 949 paragraphs of interest through automatic scanning for FRI-related keywords. Two clinical experts directly labeled 5000 paragraphs to generate benchmark-standard labels, while 3689 validated patterns were annotated, indirectly labeling 93 157 paragraphs as validated-standard labels. Five NLP models, including vanilla bidirectional encoder representations from transformers (BERT), the robustly optimized BERT approach (RoBERTa), ClinicalBERT, DistilBERT, and support vector machine (SVM), were trained using 2000 benchmark paragraphs and all validated paragraphs. BERT-based models were trained in 3 stages: masked language modeling, general boolean question-answering, and question-answering for FRIs. For validation, 500 benchmark paragraphs were used, and the remaining 2500 were used for testing. Performance metrics (precision, recall, F1 scores, area under the receiver operating characteristic curve [AUROC], and area under the precision-recall [AUPR] curve) were employed by comparison, with RoBERTa showing the best performance. Precision was 0.90 (95% CI, 0.88-0.91), recall was 0.91 (95% CI, 0.90-0.93), the F1 score was 0.91 (95% CI, 0.89-0.92), and the AUROC and AUPR curves were [both??] 0.96 (95% CI, 0.95-0.97). These NLP models accurately identify FRIs from unstructured clinical notes, potentially enhancing clinical-notes-based research efficiency.
跌倒相关损伤(FRIs)是老年患者住院的主要原因,但在非结构化临床记录中识别这些损伤对大规模研究构成了挑战。在本研究中,我们开发并评估了自然语言处理(NLP)模型来解决这一问题。我们利用了马萨诸塞州综合医院布莱根医疗保健系统中2100名老年人的所有可用临床记录,通过自动扫描与跌倒相关损伤的关键词识别出154949段感兴趣的内容。两名临床专家直接对5000段内容进行标注以生成基准标准标签,同时对3689个验证模式进行注释,将93157段内容间接标注为验证标准标签。使用2000个基准段落和所有验证段落对包括普通双向编码器表征(BERT)、稳健优化的BERT方法(RoBERTa)、ClinicalBERT、DistilBERT和支持向量机(SVM)在内的五个NLP模型进行训练。基于BERT的模型分三个阶段进行训练:掩码语言建模、一般布尔问答和跌倒相关损伤问答。为了进行验证,使用了500个基准段落,其余2500个用于测试。通过比较采用性能指标(精确率、召回率、F1分数、受试者操作特征曲线下面积[AUROC]和精确率-召回率曲线下面积[AUPR]),RoBERTa表现最佳。精确率为0.90(95%CI,0.88-0.91),召回率为0.91(95%CI,0.90-0.93),F1分数为0.91(95%CI,0.89-0.92),AUROC和AUPR曲线均为0.96(95%CI,0.95-0.97)。这些NLP模型能从非结构化临床记录中准确识别跌倒相关损伤,可能会提高基于临床记录的研究效率。