• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于静脉血栓栓塞的VTE-BERT自然语言处理模型的开发与验证

Development and Validation of VTE-BERT Natural Language Processing Model for Venous Thromboembolism.

作者信息

Jafari Omid, Ma Shengling, Lam Barbara D, Jiang Jun Y, Zhou Emily, Ranjan Mrinal, Ryu Justine, Bandyo Raka, Maghsoudi Arash, Peng Bo, Amos Christopher I, Oluyomi Abiodun, Fillmore Nathanael R, La Jennifer, Li Ang

机构信息

Section of Hematology-Oncology, Baylor College of Medicine, Houston, TX.

Division of Hematology & Oncology, Fred Hutch Cancer Center, University of Washington.

出版信息

J Thromb Haemost. 2025 Aug 1. doi: 10.1016/j.jtha.2025.07.021.

DOI:10.1016/j.jtha.2025.07.021
PMID:40754035
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12360494/
Abstract

BACKGROUND

Accurate and rapid phenotyping of venous thromboembolism (VTE) in longitudinal studies is important. A natural language processing (NLP) tool externally validated in representative patients is lacking.

METHODS

We designed a novel NLP platform, NLPMed, to assist thrombosis researchers with data preprocessing, phenotype annotation, language model finetuning, and NLP application. Utilizing clinical notes, discharge summaries, and radiology reports in patients with cancer from two healthcare institutions, we finetuned Bio_ClinicalBERT to develop VTE-BERT. The new model was trained to detect acute VTE events and their anatomical locations longitudinally. We internally and externally validated the model's performance in two randomly sampled cohorts of patients with advanced cancer.

RESULTS

The training cohort consisted of 715 patients and 14,013 annotated notes with ≥1 VTE keyword from the Harris Health System (HHS). The internal validation cohort included 400 additional patients with 7,190 VTE keyword-containing notes from HHS. The external validation cohort included 400 patients with 7,371 VTE keyword-containing notes from the National Veterans Affairs Healthcare System. VTE-BERT was trained until it reached a precision of 95% and recall of 98% on the patient level. Using independent datasets, the model achieved precision and recall of 95% and 91% in internal validation and of 85% and 92% in external validation.

CONCLUSIONS

We trained and externally validated an efficient NLP model to detect incident VTE events longitudinally. We believe its adoption will accelerate thrombosis research by improving VTE detection at scale and decreasing the time and expense involved with manual chart review in big data epidemiological studies.

摘要

背景

在纵向研究中准确、快速地表征静脉血栓栓塞症(VTE)很重要。目前缺乏在代表性患者中进行外部验证的自然语言处理(NLP)工具。

方法

我们设计了一种新型的NLP平台NLPMed,以协助血栓形成研究人员进行数据预处理、表型注释、语言模型微调及NLP应用。利用来自两个医疗机构的癌症患者的临床记录、出院小结和放射学报告,我们对Bio_ClinicalBERT进行微调以开发VTE-BERT。训练新模型以纵向检测急性VTE事件及其解剖位置。我们在两个随机抽样的晚期癌症患者队列中对该模型的性能进行了内部和外部验证。

结果

训练队列包括来自哈里斯健康系统(HHS)的715例患者和14,013份带有≥1个VTE关键词的注释记录。内部验证队列包括另外400例来自HHS的患者及7,190份包含VTE关键词的记录。外部验证队列包括来自美国退伍军人事务医疗系统的400例患者及7,371份包含VTE关键词的记录。VTE-BERT经过训练,在患者层面达到了95%的精确率和98%的召回率。使用独立数据集,该模型在内部验证中的精确率和召回率分别为95%和91%,在外部验证中的精确率和召回率分别为85%和92%。

结论

我们训练并在外部验证了一种有效的NLP模型,用于纵向检测VTE事件。我们相信,采用该模型将通过大规模改进VTE检测以及减少大数据流行病学研究中人工查阅病历所涉及的时间和费用,从而加速血栓形成研究。

相似文献

1
Development and Validation of VTE-BERT Natural Language Processing Model for Venous Thromboembolism.用于静脉血栓栓塞的VTE-BERT自然语言处理模型的开发与验证
J Thromb Haemost. 2025 Aug 1. doi: 10.1016/j.jtha.2025.07.021.
2
Prediction of risk of recurrence of venous thromboembolism following treatment for a first unprovoked venous thromboembolism: systematic review, prognostic model and clinical decision rule, and economic evaluation.首次特发性静脉血栓栓塞症治疗后静脉血栓栓塞症复发风险的预测:系统评价、预后模型与临床决策规则以及经济学评估
Health Technol Assess. 2016 Feb;20(12):i-xxxiii, 1-190. doi: 10.3310/hta20120.
3
Interventions for implementation of thromboprophylaxis in hospitalized patients at risk for venous thromboembolism.对有静脉血栓栓塞风险的住院患者实施血栓预防的干预措施。
Cochrane Database Syst Rev. 2018 Apr 24;4(4):CD008201. doi: 10.1002/14651858.CD008201.pub3.
4
Predicting the risk of venous thromboembolism in critically ill patients (PROVE-IT): a model development and validation study.预测危重症患者静脉血栓栓塞风险(PROVE-IT):一项模型开发与验证研究
J Thromb Haemost. 2025 Jul 4. doi: 10.1016/j.jtha.2025.06.026.
5
Automated Identification of Heart Failure with Reduced Ejection Fraction using Deep Learning-based Natural Language Processing.使用基于深度学习的自然语言处理技术自动识别射血分数降低的心力衰竭
medRxiv. 2023 Sep 11:2023.09.10.23295315. doi: 10.1101/2023.09.10.23295315.
6
Effect of testing for cancer on cancer- or venous thromboembolism (VTE)-related mortality and morbidity in people with unprovoked VTE.不明原因静脉血栓栓塞症(VTE)患者中,检测癌症对癌症或静脉血栓栓塞症(VTE)相关死亡率和发病率的影响。
Cochrane Database Syst Rev. 2021 Oct 1;10(10):CD010837. doi: 10.1002/14651858.CD010837.pub5.
7
Systemic treatments for the prevention of venous thrombo-embolic events in paediatric cancer patients with tunnelled central venous catheters.预防带隧道式中心静脉导管的儿科癌症患者发生静脉血栓栓塞事件的全身治疗。
Cochrane Database Syst Rev. 2013 Sep 11(9):CD009160. doi: 10.1002/14651858.CD009160.pub2.
8
Development of a Natural Language Processing Model for Extracting Kidney Biopsy Pathology Diagnoses.用于提取肾活检病理诊断的自然语言处理模型的开发
Kidney Med. 2025 Jun 14;7(8):101047. doi: 10.1016/j.xkme.2025.101047. eCollection 2025 Aug.
9
Identifying Functional Status Impairment in People Living With Dementia Through Natural Language Processing of Clinical Documents: Cross-Sectional Study.通过对临床文档的自然语言处理识别痴呆患者的功能状态障碍:横断面研究。
J Med Internet Res. 2024 Feb 13;26:e47739. doi: 10.2196/47739.
10
A systematic review of risk prediction model of venous thromboembolism for patients with lung cancer.系统评价肺癌患者静脉血栓栓塞症的风险预测模型。
Thorac Cancer. 2024 Feb;15(4):277-285. doi: 10.1111/1759-7714.15219. Epub 2024 Jan 17.

本文引用的文献

1
Machine learning natural language processing for identifying venous thromboembolism: systematic review and meta-analysis.机器学习自然语言处理在识别静脉血栓栓塞症中的应用:系统评价和荟萃分析。
Blood Adv. 2024 Jun 25;8(12):2991-3000. doi: 10.1182/bloodadvances.2023012200.
2
Natural history of cancer-associated splanchnic vein thrombosis.癌症相关内脏静脉血栓形成的自然史。
J Thromb Haemost. 2024 May;22(5):1421-1432. doi: 10.1016/j.jtha.2024.01.019. Epub 2024 Feb 1.
3
Impact of venous thromboembolism on the mortality in patients with cancer: a population-based cohort study.静脉血栓栓塞对癌症患者死亡率的影响:一项基于人群的队列研究。
Lancet Reg Health Eur. 2023 Sep 28;34:100739. doi: 10.1016/j.lanepe.2023.100739. eCollection 2023 Nov.
4
Natural Language Processing tool accurately identifies acute venous thromboembolism.自然语言处理工具可准确识别急性静脉血栓栓塞症。
Thromb Res. 2023 Sep;229:252-254. doi: 10.1016/j.thromres.2023.08.007. Epub 2023 Aug 12.
5
Development of a computable phenotype using electronic health records for venous thromboembolism in medical inpatients: the Medical Inpatient Thrombosis and Hemostasis study.利用电子健康记录开发用于内科住院患者静脉血栓栓塞的可计算表型:内科住院患者血栓形成与止血研究
Res Pract Thromb Haemost. 2023 Apr 24;7(4):100162. doi: 10.1016/j.rpth.2023.100162. eCollection 2023 May.
6
Exploring the Applicability of Using Natural Language Processing to Support Nationwide Venous Thromboembolism Surveillance: Model Evaluation Study.探索使用自然语言处理支持全国静脉血栓栓塞监测的适用性:模型评估研究
JMIR Bioinform Biotechnol. 2022 May 8;3(1):e36877. doi: 10.2196/36877.
7
Venous Thromboembolism Prophylaxis and Treatment in Patients With Cancer: ASCO Guideline Update.静脉血栓栓塞症预防和治疗癌症患者:ASCO 指南更新。
J Clin Oncol. 2023 Jun 1;41(16):3063-3071. doi: 10.1200/JCO.23.00294. Epub 2023 Apr 19.
8
Derivation and Validation of a Clinical Risk Assessment Model for Cancer-Associated Thrombosis in Two Unique US Health Care Systems.在两个独特的美国医疗保健系统中,癌症相关血栓形成的临床风险评估模型的推导和验证。
J Clin Oncol. 2023 Jun 1;41(16):2926-2938. doi: 10.1200/JCO.22.01542. Epub 2023 Jan 10.
9
Developing and optimizing a computable phenotype for incident venous thromboembolism in a longitudinal cohort of patients with cancer.在癌症患者纵向队列中开发并优化用于新发静脉血栓栓塞的可计算表型。
Res Pract Thromb Haemost. 2022 May 25;6(4):e12733. doi: 10.1002/rth2.12733. eCollection 2022 May.
10
Semiautomatic Identification of Pulmonary Embolism in Electronic Health Records Through Sentence Labeling.通过句子标注实现电子健康记录中的肺栓塞半自动识别。
Stud Health Technol Inform. 2022 Jan 14;289:69-72. doi: 10.3233/SHTI210861.