Thompson Daniel C, Mofidi Reza
Vascular Surgery Specialty Training, Health Education England North East, Newcastle upon Tyne, UK.
Academic Department of Military Surgery & Trauma, Royal Centre for Defence Medicine, Birmingham, UK.
Sci Rep. 2025 Jul 21;15(1):26388. doi: 10.1038/s41598-025-11870-6.
Patient identification for national registries often relies upon clinician recognition of cases or retrospective searches using potentially inaccurate clinical codes, leading to incomplete data capture and inefficiencies. Natural Language Processing (NLP) offers a promising solution by automating analysis of electronic health records (EHRs). This study aimed to develop NLP models for identifying and classifying abdominal aortic aneurysm (AAA) repairs from unstructured EHRs, demonstrating a proof-of-concept for automated patient identification in registries like the National Vascular Registry. Using the MIMIC-IV-Note dataset, a multi-tiered approach was developed to identify vascular patients (Task 1), AAA repairs (Task 2), and classify repairs as primary or revision (Task 3). Four NLP models were trained and evaluated using 4870 annotated records: scispaCy, BERT-base, Bio-clinicalBERT, and a scispaCy/Bio-clinicalBERT ensemble. Models were compared using accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC). The scispaCy model demonstrated the fastest training (2 min/epoch) and inference times (2.87 samples/sec). For Task 1, scispaCy and ensemble models achieved the highest accuracy (0.97). In Task 2, all models performed exceptionally well, with ensemble, scispaCy, and Bio-clinicalBERT models achieving 0.99 accuracy and 1.00 AUC. For Task 3, Bio-clinicalBERT and the ensemble model achieved an AUC of 1.00, with Bio-clinicalBERT displaying the best overall accuracy (0.98). This study demonstrates that NLP models can accurately identify and classify AAA repair cases from unstructured EHRs, suggesting significant potential for automating patient identification in vascular surgery and other medical registries, reducing administra.
国家登记处的患者识别通常依赖于临床医生对病例的识别或使用可能不准确的临床编码进行回顾性搜索,从而导致数据采集不完整和效率低下。自然语言处理(NLP)通过对电子健康记录(EHR)进行自动分析提供了一个有前景的解决方案。本研究旨在开发NLP模型,用于从非结构化EHR中识别和分类腹主动脉瘤(AAA)修复手术,为国家血管登记处等登记系统中的自动患者识别提供概念验证。使用MIMIC-IV-Note数据集,开发了一种多层方法来识别血管疾病患者(任务1)、AAA修复手术(任务2),并将修复手术分类为初次手术或翻修手术(任务3)。使用4870条注释记录对四个NLP模型进行了训练和评估:scispaCy、BERT-base、Bio-clinicalBERT以及scispaCy/Bio-clinicalBERT集成模型。使用准确率、精确率、召回率、F1分数和受试者操作特征曲线下面积(AUC)对模型进行比较。scispaCy模型展示出最快的训练速度(每轮2分钟)和推理时间(每秒2.87个样本)。对于任务1,scispaCy模型和集成模型实现了最高准确率(0.97)。在任务2中,所有模型表现都非常出色,集成模型、scispaCy模型和Bio-clinicalBERT模型的准确率达到0.99,AUC达到1.00。对于任务3,Bio-clinicalBERT模型和集成模型的AUC达到1.00,并显示出最佳的总体准确率(0.98)。本研究表明,NLP模型可以准确地从非结构化EHR中识别和分类AAA修复病例,这表明在血管外科手术和其他医学登记系统中实现患者识别自动化具有巨大潜力,可减少行政管理……(原文此处似乎不完整)