机器学习自然语言处理在识别静脉血栓栓塞症中的应用:系统评价和荟萃分析。
Machine learning natural language processing for identifying venous thromboembolism: systematic review and meta-analysis.
机构信息
Division of Hematology, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA.
Division of Clinical Informatics, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA.
出版信息
Blood Adv. 2024 Jun 25;8(12):2991-3000. doi: 10.1182/bloodadvances.2023012200.
Venous thromboembolism (VTE) is a leading cause of preventable in-hospital mortality. Monitoring VTE cases is limited by the challenges of manual medical record review and diagnosis code interpretation. Natural language processing (NLP) can automate the process. Rule-based NLP methods are effective but time consuming. Machine learning (ML)-NLP methods present a promising solution. We conducted a systematic review and meta-analysis of studies published before May 2023 that use ML-NLP to identify VTE diagnoses in the electronic health records. Four reviewers screened all manuscripts, excluding studies that only used a rule-based method. A meta-analysis evaluated the pooled performance of each study's best performing model that evaluated for pulmonary embolism and/or deep vein thrombosis. Pooled sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) with confidence interval (CI) were calculated by DerSimonian and Laird method using a random-effects model. Study quality was assessed using an adapted TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) tool. Thirteen studies were included in the systematic review and 8 had data available for meta-analysis. Pooled sensitivity was 0.931 (95% CI, 0.881-0.962), specificity 0.984 (95% CI, 0.967-0.992), PPV 0.910 (95% CI, 0.865-0.941) and NPV 0.985 (95% CI, 0.977-0.990). All studies met at least 13 of the 21 NLP-modified TRIPOD items, demonstrating fair quality. The highest performing models used vectorization rather than bag-of-words and deep-learning techniques such as convolutional neural networks. There was significant heterogeneity in the studies, and only 4 validated their model on an external data set. Further standardization of ML studies can help progress this novel technology toward real-world implementation.
静脉血栓栓塞症(VTE)是可预防的院内死亡的主要原因。对 VTE 病例的监测受到手动病历审查和诊断代码解释的挑战限制。自然语言处理(NLP)可以实现该过程的自动化。基于规则的 NLP 方法虽然有效,但耗时。机器学习(ML)-NLP 方法提供了一个很有前途的解决方案。我们对截至 2023 年 5 月之前发表的使用 ML-NLP 来识别电子健康记录中 VTE 诊断的研究进行了系统评价和荟萃分析。四名审查员筛选了所有的手稿,排除了仅使用基于规则的方法的研究。荟萃分析评估了对肺栓塞和/或深静脉血栓形成进行评估的每个研究中表现最佳的模型的汇总性能。使用 DerSimonian 和 Laird 方法,通过随机效应模型计算置信区间(CI)内的汇总敏感性、特异性、阳性预测值(PPV)和阴性预测值(NPV)。使用改编的 TRIPOD(用于个体预后或诊断的多变量预测模型的透明报告)工具评估研究质量。系统评价共纳入 13 项研究,8 项研究有可用数据进行荟萃分析。汇总敏感性为 0.931(95%CI,0.881-0.962),特异性为 0.984(95%CI,0.967-0.992),PPV 为 0.910(95%CI,0.865-0.941),NPV 为 0.985(95%CI,0.977-0.990)。所有研究均至少满足 21 项 NLP 修正后的 TRIPOD 项目中的 13 项,表明其质量尚可。表现最好的模型使用向量化而不是词袋和深度学习技术,如卷积神经网络。研究之间存在显著的异质性,只有 4 项研究在外部数据集上验证了其模型。进一步标准化 ML 研究可以帮助这项新技术向实际应用推进。
相似文献
引用本文的文献
Blood Vessel Thromb Hemost. 2025-1-15
J Thromb Haemost. 2025-8-1
本文引用的文献
N Engl J Med. 2023-6-22
NPJ Digit Med. 2022-12-26
Multimed Tools Appl. 2023
Stud Health Technol Inform. 2022-1-14
BMC Med Inform Decis Mak. 2021-6-3