机器学习自然语言处理在识别静脉血栓栓塞症中的应用：系统评价和荟萃分析。

Machine learning natural language processing for identifying venous thromboembolism: systematic review and meta-analysis.

机构信息

Division of Hematology, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA.

Division of Clinical Informatics, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA.

出版信息

Blood Adv. 2024 Jun 25;8(12):2991-3000. doi: 10.1182/bloodadvances.2023012200.

DOI:10.1182/bloodadvances.2023012200

PMID:38522096

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11215191/

Abstract

Venous thromboembolism (VTE) is a leading cause of preventable in-hospital mortality. Monitoring VTE cases is limited by the challenges of manual medical record review and diagnosis code interpretation. Natural language processing (NLP) can automate the process. Rule-based NLP methods are effective but time consuming. Machine learning (ML)-NLP methods present a promising solution. We conducted a systematic review and meta-analysis of studies published before May 2023 that use ML-NLP to identify VTE diagnoses in the electronic health records. Four reviewers screened all manuscripts, excluding studies that only used a rule-based method. A meta-analysis evaluated the pooled performance of each study's best performing model that evaluated for pulmonary embolism and/or deep vein thrombosis. Pooled sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) with confidence interval (CI) were calculated by DerSimonian and Laird method using a random-effects model. Study quality was assessed using an adapted TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) tool. Thirteen studies were included in the systematic review and 8 had data available for meta-analysis. Pooled sensitivity was 0.931 (95% CI, 0.881-0.962), specificity 0.984 (95% CI, 0.967-0.992), PPV 0.910 (95% CI, 0.865-0.941) and NPV 0.985 (95% CI, 0.977-0.990). All studies met at least 13 of the 21 NLP-modified TRIPOD items, demonstrating fair quality. The highest performing models used vectorization rather than bag-of-words and deep-learning techniques such as convolutional neural networks. There was significant heterogeneity in the studies, and only 4 validated their model on an external data set. Further standardization of ML studies can help progress this novel technology toward real-world implementation.

摘要

静脉血栓栓塞症（VTE）是可预防的院内死亡的主要原因。对 VTE 病例的监测受到手动病历审查和诊断代码解释的挑战限制。自然语言处理（NLP）可以实现该过程的自动化。基于规则的 NLP 方法虽然有效，但耗时。机器学习（ML）-NLP 方法提供了一个很有前途的解决方案。我们对截至 2023 年 5 月之前发表的使用 ML-NLP 来识别电子健康记录中 VTE 诊断的研究进行了系统评价和荟萃分析。四名审查员筛选了所有的手稿，排除了仅使用基于规则的方法的研究。荟萃分析评估了对肺栓塞和/或深静脉血栓形成进行评估的每个研究中表现最佳的模型的汇总性能。使用 DerSimonian 和 Laird 方法，通过随机效应模型计算置信区间（CI）内的汇总敏感性、特异性、阳性预测值（PPV）和阴性预测值（NPV）。使用改编的 TRIPOD（用于个体预后或诊断的多变量预测模型的透明报告）工具评估研究质量。系统评价共纳入 13 项研究，8 项研究有可用数据进行荟萃分析。汇总敏感性为 0.931（95%CI，0.881-0.962），特异性为 0.984（95%CI，0.967-0.992），PPV 为 0.910（95%CI，0.865-0.941），NPV 为 0.985（95%CI，0.977-0.990）。所有研究均至少满足 21 项 NLP 修正后的 TRIPOD 项目中的 13 项，表明其质量尚可。表现最好的模型使用向量化而不是词袋和深度学习技术，如卷积神经网络。研究之间存在显著的异质性，只有 4 项研究在外部数据集上验证了其模型。进一步标准化 ML 研究可以帮助这项新技术向实际应用推进。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd9a/11215191/46655e6d50cd/BLOODA_ADV-2023-012200-ga1.jpg

相似文献

Machine learning natural language processing for identifying venous thromboembolism: systematic review and meta-analysis.

Blood Adv. 2024 Jun 25;8(12):2991-3000. doi: 10.1182/bloodadvances.2023012200.

The use of natural language processing on pediatric diagnostic radiology reports in the electronic health record to identify deep venous thrombosis in children.

J Thromb Thrombolysis. 2017 Oct;44(3):281-290. doi: 10.1007/s11239-017-1532-y.

Natural Language Processing in a Clinical Decision Support System for the Identification of Venous Thromboembolism: Algorithm Development and Validation.

J Med Internet Res. 2023 Apr 24;25:e43153. doi: 10.2196/43153.

Natural Language Processing Performance for the Identification of Venous Thromboembolism in an Integrated Healthcare System.

Clin Appl Thromb Hemost. 2021 Jan-Dec;27:10760296211013108. doi: 10.1177/10760296211013108.

Developing and validating natural language processing algorithms for radiology reports compared to ICD-10 codes for identifying venous thromboembolism in hospitalized medical patients.

Thromb Res. 2022 Jan;209:51-58. doi: 10.1016/j.thromres.2021.11.020. Epub 2021 Nov 27.

A novel method of adverse event detection can accurately identify venous thromboembolisms (VTEs) from narrative electronic health record data.

J Am Med Inform Assoc. 2015 Jan;22(1):155-65. doi: 10.1136/amiajnl-2014-002768. Epub 2014 Oct 20.

Exploring the Applicability of Using Natural Language Processing to Support Nationwide Venous Thromboembolism Surveillance: Model Evaluation Study.

JMIR Bioinform Biotechnol. 2022 May 8;3(1):e36877. doi: 10.2196/36877.

Automated Extraction of VTE Events From Narrative Radiology Reports in Electronic Health Records: A Validation Study.

Med Care. 2017 Oct;55(10):e73-e80. doi: 10.1097/MLR.0000000000000346.

Development of a predictive model of venous thromboembolism recurrence in anticoagulated cancer patients using machine learning.

Thromb Res. 2023 Aug;228:181-188. doi: 10.1016/j.thromres.2023.06.015. Epub 2023 Jun 16.

Comparison of Natural Language Processing of Clinical Notes With a Validated Risk-Stratification Tool to Predict Severe Maternal Morbidity.

JAMA Netw Open. 2022 Oct 3;5(10):e2234924. doi: 10.1001/jamanetworkopen.2022.34924.

引用本文的文献

Generative artificial intelligence for automated data extraction from unstructured medical text.

JAMIA Open. 2025 Sep 4;8(5):ooaf097. doi: 10.1093/jamiaopen/ooaf097. eCollection 2025 Oct.

Large language models for chart review: how machine learning can accelerate hematology research.

Blood Vessel Thromb Hemost. 2025 Jan 15;2(1):100052. doi: 10.1016/j.bvth.2025.100052. eCollection 2025 Feb.

Development and Validation of VTE-BERT Natural Language Processing Model for Venous Thromboembolism.

J Thromb Haemost. 2025 Aug 1. doi: 10.1016/j.jtha.2025.07.021.

Using a transformer language model to curate a pulmonary embolism dataset from the Medical Information Mart for Intensive Care IV: MIMIC-IV-Ext-PE.

Res Pract Thromb Haemost. 2025 May 21;9(4):102896. doi: 10.1016/j.rpth.2025.102896. eCollection 2025 May.

Comparing efficiency of an attention-based deep learning network with contemporary radiological workflow for pulmonary embolism detection on CTPA: A retrospective study.

Eur J Radiol Open. 2025 May 9;14:100657. doi: 10.1016/j.ejro.2025.100657. eCollection 2025 Jun.

Artificial intelligence in thrombosis: transformative potential and emerging challenges.

Thromb J. 2025 Jan 16;23(1):2. doi: 10.1186/s12959-025-00690-3.

本文引用的文献

Publisher Correction: Large language models encode clinical knowledge.

Nature. 2023 Aug;620(7973):E19. doi: 10.1038/s41586-023-06455-0.

Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine. Reply.

N Engl J Med. 2023 Jun 22;388(25):2400. doi: 10.1056/NEJMc2305286.

Exploring the Applicability of Using Natural Language Processing to Support Nationwide Venous Thromboembolism Surveillance: Model Evaluation Study.

JMIR Bioinform Biotechnol. 2022 May 8;3(1):e36877. doi: 10.2196/36877.

Natural Language Processing in Electronic Health Records in relation to healthcare decision-making: A systematic review.

Comput Biol Med. 2023 Mar;155:106649. doi: 10.1016/j.compbiomed.2023.106649. Epub 2023 Feb 10.

A large language model for electronic health records.

NPJ Digit Med. 2022 Dec 26;5(1):194. doi: 10.1038/s41746-022-00742-2.

Natural language processing: state of the art, current trends and challenges.

Multimed Tools Appl. 2023;82(3):3713-3744. doi: 10.1007/s11042-022-13428-4. Epub 2022 Jul 14.

Semiautomatic Identification of Pulmonary Embolism in Electronic Health Records Through Sentence Labeling.

Stud Health Technol Inform. 2022 Jan 14;289:69-72. doi: 10.3233/SHTI210861.

A Review of Recent Work in Transfer Learning and Domain Adaptation for Natural Language Processing of Electronic Health Records.

Yearb Med Inform. 2021 Aug;30(1):239-244. doi: 10.1055/s-0041-1726522. Epub 2021 Sep 3.

Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence.

BMJ Open. 2021 Jul 9;11(7):e048008. doi: 10.1136/bmjopen-2020-048008.

A systematic review of natural language processing applied to radiology reports.

BMC Med Inform Decis Mak. 2021 Jun 3;21(1):179. doi: 10.1186/s12911-021-01533-7.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

机器学习自然语言处理在识别静脉血栓栓塞症中的应用：系统评价和荟萃分析。

Machine learning natural language processing for identifying venous thromboembolism: systematic review and meta-analysis.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献