Department of Pulmonary Vascular and Thrombotic Disease, Sixth Medical Center of Chinese People's Liberation Army General Hospital, Beijing, China.
Chinese People's Liberation Army Medical School, Beijing, China.
J Med Internet Res. 2023 Apr 24;25:e43153. doi: 10.2196/43153.
It remains unknown whether capturing data from electronic health records (EHRs) using natural language processing (NLP) can improve venous thromboembolism (VTE) detection in different clinical settings.
The aim of this study was to validate the NLP algorithm in a clinical decision support system for VTE risk assessment and integrated care (DeVTEcare) to identify VTEs from EHRs.
All inpatients aged ≥18 years in the Sixth Medical Center of the Chinese People's Liberation Army General Hospital from January 1 to December 31, 2021, were included as the validation cohort. The sensitivity, specificity, positive and negative likelihood ratios (LR+ and LR-, respectively), area under the receiver operating characteristic curve (AUC), and F1-scores along with their 95% CIs were used to analyze the performance of the NLP tool, with manual review of medical records as the reference standard for detecting deep vein thrombosis (DVT) and pulmonary embolism (PE). The primary end point was the performance of the NLP approach embedded into the EHR for VTE identification. The secondary end points were the performances to identify VTE among different hospital departments with different VTE risks. Subgroup analyses were performed among age, sex, and the study season.
Among 30,152 patients (median age 56 [IQR 41-67] years; 14,247/30,152, 47.3% females), the prevalence of VTE, PE, and DVT was 2.1% (626/30,152), 0.6% (177/30,152), and 1.8% (532/30,152), respectively. The sensitivity, specificity, LR+, LR-, AUC, and F1-score of NLP-facilitated VTE detection were 89.9% (95% CI 87.3%-92.2%), 99.8% (95% CI 99.8%-99.9%), 483 (95% CI 370-629), 0.10 (95% CI 0.08-0.13), 0.95 (95% CI 0.94-0.96), and 0.90 (95% CI 0.90-0.91), respectively. Among departments of surgery, internal medicine, and intensive care units, the highest specificity (100% vs 99.7% vs 98.8%, respectively), LR+ (3202 vs 321 vs 77, respectively), and F1-score (0.95 vs 0.89 vs 0.92, respectively) were in the surgery department (all P<.001). Among low, intermediate, and high VTE risks in hospital departments, the low-risk department had the highest AUC (1.00 vs 0.94 vs 0.96, respectively) and F1-score (0.97 vs 0.90 vs 0.90, respectively) as well as the lowest LR- (0.00 vs 0.13 vs 0.08, respectively) (DeLong test for AUC; all P<.001). Subgroup analysis of the age, sex, and season demonstrated consistently good performance of VTE detection with >87% sensitivity and specificity and >89% AUC and F1-score. The NLP algorithm performed better among patients aged ≤65 years than among those aged >65 years (F1-score 0.93 vs 0.89, respectively; P<.001).
The NLP algorithm in our DeVTEcare identified VTE well across different clinical settings, especially in patients in surgery units, departments with low-risk VTE, and patients aged ≤65 years. This algorithm can help to inform accurate in-hospital VTE rates and enhance risk-classified VTE integrated care in future research.
目前尚不清楚使用自然语言处理(NLP)从电子健康记录(EHR)中获取数据是否可以提高不同临床环境下的静脉血栓栓塞症(VTE)检出率。
本研究旨在验证 DeVTEcare 中用于 VTE 风险评估和综合护理的临床决策支持系统中 NLP 算法,以从 EHR 中识别 VTE。
纳入 2021 年 1 月 1 日至 12 月 31 日期间中国人民解放军总医院第六医学中心所有年龄≥18 岁的住院患者作为验证队列。采用敏感度、特异度、阳性和阴性似然比(LR+和 LR-)、受试者工作特征曲线下面积(AUC)、F1 评分及其 95%CI 来分析 NLP 工具的性能,以病历手动复查作为深静脉血栓形成(DVT)和肺栓塞(PE)的参考标准。主要终点是 EHR 中嵌入的 NLP 方法用于 VTE 识别的性能。次要终点是在不同 VTE 风险的医院科室中识别 VTE 的性能。在年龄、性别和研究季节等方面进行了亚组分析。
在 30152 例患者中(中位年龄 56 [四分位距 41-67]岁;14247/30152,47.3%为女性),VTE、PE 和 DVT 的患病率分别为 2.1%(626/30152)、0.6%(177/30152)和 1.8%(532/30152)。NLP 辅助 VTE 检测的敏感度、特异度、LR+、LR-、AUC 和 F1 评分分别为 89.9%(95%CI 87.3%-92.2%)、99.8%(95%CI 99.8%-99.9%)、483(95%CI 370-629)、0.10(95%CI 0.08-0.13)、0.95(95%CI 0.94-0.96)和 0.90(95%CI 0.90-0.91)。在外科、内科和重症监护病房等科室中,外科的特异度最高(100%比 99.7%比 98.8%),LR+最高(3202 比 321 比 77),F1 评分最高(0.95 比 0.89 比 0.92)(均 P<.001)。在医院科室的低、中、高 VTE 风险中,低风险科室的 AUC(1.00 比 0.94 比 0.96)和 F1 评分(0.97 比 0.90 比 0.90)最高,LR-最低(0.00 比 0.13 比 0.08)(DeLong 检验用于 AUC;均 P<.001)。在年龄、性别和季节的亚组分析中,VTE 检测的敏感性和特异性均>87%,AUC 和 F1 评分均>89%,LR-均<0.13%,表现出较好的性能。NLP 算法在≤65 岁患者中的表现优于>65 岁患者(F1 评分分别为 0.93 比 0.89,P<.001)。
DeVTEcare 中的 NLP 算法在不同临床环境下均能很好地识别 VTE,尤其是在外科病房、低 VTE 风险科室和≤65 岁患者中。该算法有助于提供准确的院内 VTE 发生率,并在未来研究中增强基于风险分类的 VTE 综合护理。