用于静脉血栓栓塞的VTE-BERT自然语言处理模型的开发与验证

Development and Validation of VTE-BERT Natural Language Processing Model for Venous Thromboembolism.

作者信息

Jafari Omid, Ma Shengling, Lam Barbara D, Jiang Jun Y, Zhou Emily, Ranjan Mrinal, Ryu Justine, Bandyo Raka, Maghsoudi Arash, Peng Bo, Amos Christopher I, Oluyomi Abiodun, Fillmore Nathanael R, La Jennifer, Li Ang

机构信息

Section of Hematology-Oncology, Baylor College of Medicine, Houston, TX.

Division of Hematology & Oncology, Fred Hutch Cancer Center, University of Washington.

出版信息

J Thromb Haemost. 2025 Aug 1. doi: 10.1016/j.jtha.2025.07.021.

DOI:10.1016/j.jtha.2025.07.021

PMID:40754035

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12360494/

Abstract

BACKGROUND

Accurate and rapid phenotyping of venous thromboembolism (VTE) in longitudinal studies is important. A natural language processing (NLP) tool externally validated in representative patients is lacking.

METHODS

We designed a novel NLP platform, NLPMed, to assist thrombosis researchers with data preprocessing, phenotype annotation, language model finetuning, and NLP application. Utilizing clinical notes, discharge summaries, and radiology reports in patients with cancer from two healthcare institutions, we finetuned Bio_ClinicalBERT to develop VTE-BERT. The new model was trained to detect acute VTE events and their anatomical locations longitudinally. We internally and externally validated the model's performance in two randomly sampled cohorts of patients with advanced cancer.

RESULTS

The training cohort consisted of 715 patients and 14,013 annotated notes with ≥1 VTE keyword from the Harris Health System (HHS). The internal validation cohort included 400 additional patients with 7,190 VTE keyword-containing notes from HHS. The external validation cohort included 400 patients with 7,371 VTE keyword-containing notes from the National Veterans Affairs Healthcare System. VTE-BERT was trained until it reached a precision of 95% and recall of 98% on the patient level. Using independent datasets, the model achieved precision and recall of 95% and 91% in internal validation and of 85% and 92% in external validation.

CONCLUSIONS

We trained and externally validated an efficient NLP model to detect incident VTE events longitudinally. We believe its adoption will accelerate thrombosis research by improving VTE detection at scale and decreasing the time and expense involved with manual chart review in big data epidemiological studies.

摘要

背景

在纵向研究中准确、快速地表征静脉血栓栓塞症（VTE）很重要。目前缺乏在代表性患者中进行外部验证的自然语言处理（NLP）工具。

方法

我们设计了一种新型的NLP平台NLPMed，以协助血栓形成研究人员进行数据预处理、表型注释、语言模型微调及NLP应用。利用来自两个医疗机构的癌症患者的临床记录、出院小结和放射学报告，我们对Bio_ClinicalBERT进行微调以开发VTE-BERT。训练新模型以纵向检测急性VTE事件及其解剖位置。我们在两个随机抽样的晚期癌症患者队列中对该模型的性能进行了内部和外部验证。

结果

训练队列包括来自哈里斯健康系统（HHS）的715例患者和14,013份带有≥1个VTE关键词的注释记录。内部验证队列包括另外400例来自HHS的患者及7,190份包含VTE关键词的记录。外部验证队列包括来自美国退伍军人事务医疗系统的400例患者及7,371份包含VTE关键词的记录。VTE-BERT经过训练，在患者层面达到了95%的精确率和98%的召回率。使用独立数据集，该模型在内部验证中的精确率和召回率分别为95%和91%，在外部验证中的精确率和召回率分别为85%和92%。

结论

我们训练并在外部验证了一种有效的NLP模型，用于纵向检测VTE事件。我们相信，采用该模型将通过大规模改进VTE检测以及减少大数据流行病学研究中人工查阅病历所涉及的时间和费用，从而加速血栓形成研究。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于静脉血栓栓塞的VTE-BERT自然语言处理模型的开发与验证

Development and Validation of VTE-BERT Natural Language Processing Model for Venous Thromboembolism.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

本文引用的文献

用于静脉血栓栓塞的VTE-BERT自然语言处理模型的开发与验证

Development and Validation of VTE-BERT Natural Language Processing Model for Venous Thromboembolism.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

本文引用的文献