自然语言处理与出院小结中出血事件检测的编码：比较横断面研究

Natural Language Processing and Coding for Detecting Bleeding Events in Discharge Summaries: Comparative Cross-Sectional Study.

作者信息

Gaspar Frederic, Zayene Mehdi, Coumau Claire, Bertrand Elliott, Bettex Marie, Le Pogam Marie Annick, Csajka Chantal

机构信息

Center for Research and Innovation in Clinical Pharmaceutical Sciences, Rue du Bugnon 19, Lausanne, 1011, Switzerland, 41 763306834.

School of Pharmaceutical Sciences, University of Geneva, Geneva, Switzerland.

出版信息

JMIR Med Inform. 2025 Aug 29;13:e67837. doi: 10.2196/67837.

BACKGROUND

Bleeding adverse drug events (ADEs), particularly among older inpatients receiving antithrombotic therapy, represent a major safety concern in hospitals. These events are often underdetected by conventional rule-based systems relying on structured electronic medical record data, such as the ICD-10 (International Statistical Classification of Diseases and Related Health Problems 10th Revision) codes, which lack the granularity to capture nuanced clinical narratives.

OBJECTIVE

This study aimed to develop and evaluate a natural language processing (NLP) model to detect and categorize bleeding ADEs in discharge summaries of older adults. Specifically, the model was designed to distinguish between "clinically significant bleeding," "severe bleeding," "history of bleeding," and "no bleeding," and was compared with a rule-based algorithm using ICD-10 codes.

METHODS

Clinicians manually annotated 400 discharge summaries, comprising 65,706 sentences, into four categories: "no bleeding," "clinically significant bleeding," "severe bleeding," and "history of bleeding." The dataset was divided into a training set (70%, 47,100 sentences) and a test set (30%, 18,606 sentences). Two detection approaches were developed and evaluated: (1) an NLP model using binary logistic regression and support vector machine classifiers, and (2) a traditional rule-based algorithm relying exclusively on predefined ICD-10 codes. To address class imbalance, with most sentences categorized as irrelevant ("no bleeding"), a class-weighting strategy was applied in the NLP model. Model performance was assessed using accuracy, precision, recall, F1-score, and receiver operating characteristic (ROC) curve analyses, with manual annotations as the gold standard.

RESULTS

The NLP model significantly outperformed the rule-based approach across all evaluation metrics. At the document level, the NLP model achieved macro-average scores of 0.81 for accuracy and 0.80 for F1-score. Precision was particularly high for detecting severe (0.92) and clinically significant bleeding events (0.87), demonstrating strong classification capability despite class imbalance. ROC analyses confirmed the model's robust diagnostic performance, yielding an area under the curve (AUC) of 0.91 when distinguishing irrelevant sentences from potential bleeding events, 0.88 for identifying historical mentions of bleeding, and notably, 0.94 for differentiating clinically significant from severe bleeding. In contrast, the rule-based ICD-10 model demonstrated high precision (0.94) for clinically significant bleeding but poor recall (0.03) for severe bleeding events, reflecting frequent missed detections. This limitation arose due to its reliance on commonly used ICD-10 codes (eg, gastrointestinal hemorrhage) and inadequate capture of rare severe bleeding conditions such as shock due to hemorrhage.

CONCLUSIONS

This study highlights the considerable advantage of NLP over traditional ICD-10-based methods for detecting bleeding ADEs within electronic medical records. The NLP model effectively captured nuanced clinical narratives, including severity, negations, and historical bleeding events, demonstrating substantial promise for improving patient safety surveillance and clinical decision-making. Future research should extend validation across multiple institutions, diversify annotated datasets, and further refine temporal reasoning capabilities within NLP algorithms.

背景

出血性药物不良事件（ADEs），尤其是在接受抗血栓治疗的老年住院患者中，是医院主要的安全问题。这些事件在依赖结构化电子病历数据（如ICD-10（《疾病和相关健康问题国际统计分类第10次修订版》）编码）的传统基于规则的系统中常常未被充分检测到，因为这些编码缺乏捕捉细微临床描述的粒度。

目的

本研究旨在开发并评估一种自然语言处理（NLP）模型，用于在老年人出院小结中检测和分类出血性ADEs。具体而言，该模型旨在区分“具有临床意义的出血”“严重出血”“出血史”和“无出血”，并与使用ICD-10编码的基于规则的算法进行比较。

方法

临床医生将400份出院小结（包含65,706个句子）手动标注为四类：“无出血”“具有临床意义的出血”“严重出血”和“出血史”。数据集被分为训练集（70%，47,100个句子）和测试集（30%，18,606个句子）。开发并评估了两种检测方法：（1）使用二元逻辑回归和支持向量机分类器的NLP模型，以及（2）仅依赖预定义ICD-10编码的传统基于规则的算法。为解决类别不平衡问题（大多数句子被归类为无关（“无出血”）），在NLP模型中应用了类别加权策略。以手动标注作为金标准，使用准确率、精确率、召回率、F1分数和受试者工作特征（ROC）曲线分析来评估模型性能。

结果

在所有评估指标上，NLP模型均显著优于基于规则的方法。在文档层面，NLP模型的宏观平均准确率得分为0.81，F1分数为0.80。在检测严重出血（0.92）和具有临床意义的出血事件（0.87）方面，精确率特别高，表明尽管存在类别不平衡，但仍具有强大的分类能力。ROC分析证实了该模型强大的诊断性能，在区分无关句子与潜在出血事件时，曲线下面积（AUC）为0.91，识别出血的历史提及为0.88，值得注意的是，在区分具有临床意义的出血与严重出血时为0.94。相比之下，基于ICD-10的规则模型在检测具有临床意义的出血方面显示出较高的精确率（0.94），但在检测严重出血事件时召回率较低（0.03），这反映出频繁漏检。这种局限性源于其对常用ICD-10编码（如胃肠道出血）的依赖，以及对罕见严重出血情况（如出血性休克）的捕捉不足。

结论

本研究突出了NLP在电子病历中检测出血性ADEs方面相对于传统基于ICD-10方法的显著优势。NLP模型有效地捕捉了细微的临床描述，包括严重程度、否定词和出血历史事件，显示出在改善患者安全监测和临床决策方面的巨大潜力。未来的研究应在多个机构进行验证扩展，使标注数据集多样化，并进一步完善NLP算法中的时间推理能力。