• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用级联微调的领域特定语言模型将临床试验中的疫苗名称映射到疫苗本体。

Mapping vaccine names in clinical trials to vaccine ontology using cascaded fine-tuned domain-specific language models.

机构信息

Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL, 32224, USA.

McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.

出版信息

J Biomed Semantics. 2024 Aug 10;15(1):14. doi: 10.1186/s13326-024-00318-x.

DOI:10.1186/s13326-024-00318-x
PMID:39123237
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11316402/
Abstract

BACKGROUND

Vaccines have revolutionized public health by providing protection against infectious diseases. They stimulate the immune system and generate memory cells to defend against targeted diseases. Clinical trials evaluate vaccine performance, including dosage, administration routes, and potential side effects.

CLINICALTRIALS

gov is a valuable repository of clinical trial information, but the vaccine data in them lacks standardization, leading to challenges in automatic concept mapping, vaccine-related knowledge development, evidence-based decision-making, and vaccine surveillance.

RESULTS

In this study, we developed a cascaded framework that capitalized on multiple domain knowledge sources, including clinical trials, the Unified Medical Language System (UMLS), and the Vaccine Ontology (VO), to enhance the performance of domain-specific language models for automated mapping of VO from clinical trials. The Vaccine Ontology (VO) is a community-based ontology that was developed to promote vaccine data standardization, integration, and computer-assisted reasoning. Our methodology involved extracting and annotating data from various sources. We then performed pre-training on the PubMedBERT model, leading to the development of CTPubMedBERT. Subsequently, we enhanced CTPubMedBERT by incorporating SAPBERT, which was pretrained using the UMLS, resulting in CTPubMedBERT + SAPBERT. Further refinement was accomplished through fine-tuning using the Vaccine Ontology corpus and vaccine data from clinical trials, yielding the CTPubMedBERT + SAPBERT + VO model. Finally, we utilized a collection of pre-trained models, along with the weighted rule-based ensemble approach, to normalize the vaccine corpus and improve the accuracy of the process. The ranking process in concept normalization involves prioritizing and ordering potential concepts to identify the most suitable match for a given context. We conducted a ranking of the Top 10 concepts, and our experimental results demonstrate that our proposed cascaded framework consistently outperformed existing effective baselines on vaccine mapping, achieving 71.8% on top 1 candidate's accuracy and 90.0% on top 10 candidate's accuracy.

CONCLUSION

This study provides a detailed insight into a cascaded framework of fine-tuned domain-specific language models improving mapping of VO from clinical trials. By effectively leveraging domain-specific information and applying weighted rule-based ensembles of different pre-trained BERT models, our framework can significantly enhance the mapping of VO from clinical trials.

摘要

背景

疫苗通过提供针对传染病的保护,彻底改变了公共卫生。它们刺激免疫系统并产生记忆细胞,以抵御靶向疾病。临床试验评估疫苗的性能,包括剂量、给药途径和潜在的副作用。

临床试验

gov 是临床试验信息的宝贵资源库,但其中的疫苗数据缺乏标准化,导致在自动概念映射、疫苗相关知识开发、循证决策和疫苗监测方面面临挑战。

结果

在这项研究中,我们开发了一个级联框架,利用多个领域知识源,包括临床试验、统一医学语言系统 (UMLS) 和疫苗本体 (VO),来提高针对从临床试验中自动映射 VO 的领域特定语言模型的性能。疫苗本体 (VO) 是一个基于社区的本体,旨在促进疫苗数据的标准化、集成和计算机辅助推理。我们的方法包括从各种来源提取和注释数据。然后,我们对 PubMedBERT 模型进行预训练,从而开发出 CTPubMedBERT。随后,我们通过使用 UMLS 预训练的 SAPBERT 来增强 CTPubMedBERT,从而得到 CTPubMedBERT+SAPBERT。进一步的改进是通过使用疫苗本体语料库和临床试验中的疫苗数据进行微调来实现的,从而得到 CTPubMedBERT+SAPBERT+VO 模型。最后,我们使用了一组预训练的模型和加权规则基集成方法来规范化疫苗语料库并提高该过程的准确性。概念规范化中的排序过程涉及对潜在概念进行优先级排序和排序,以确定给定上下文中最合适的匹配。我们对前 10 个概念进行了排名,实验结果表明,我们提出的级联框架在疫苗映射方面始终优于现有的有效基线,在最佳候选者的准确率达到 71.8%,在最佳候选者的准确率达到 90.0%。

结论

本研究深入探讨了一种微调的领域特定语言模型级联框架,该框架可以提高从临床试验中映射 VO 的能力。通过有效地利用领域特定信息,并应用不同预训练 BERT 模型的加权规则基集成,我们的框架可以显著提高从临床试验中映射 VO 的能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bcb/11316402/99e527365cd0/13326_2024_318_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bcb/11316402/66338bbc5681/13326_2024_318_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bcb/11316402/744fd9d8aac7/13326_2024_318_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bcb/11316402/99e527365cd0/13326_2024_318_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bcb/11316402/66338bbc5681/13326_2024_318_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bcb/11316402/744fd9d8aac7/13326_2024_318_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bcb/11316402/99e527365cd0/13326_2024_318_Fig3_HTML.jpg

相似文献

1
Mapping vaccine names in clinical trials to vaccine ontology using cascaded fine-tuned domain-specific language models.使用级联微调的领域特定语言模型将临床试验中的疫苗名称映射到疫苗本体。
J Biomed Semantics. 2024 Aug 10;15(1):14. doi: 10.1186/s13326-024-00318-x.
2
Mapping Vaccine Names in Clinical Trials to Vaccine Ontology using Cascaded Fine-Tuned Domain-Specific Language Models.使用级联微调的特定领域语言模型将临床试验中的疫苗名称映射到疫苗本体。
Res Sq. 2023 Sep 27:rs.3.rs-3362256. doi: 10.21203/rs.3.rs-3362256/v1.
3
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
4
Enhancing Clinical Relevance of Pretrained Language Models Through Integration of External Knowledge: Case Study on Cardiovascular Diagnosis From Electronic Health Records.通过整合外部知识提高预训练语言模型的临床相关性:来自电子健康记录的心血管诊断案例研究
JMIR AI. 2024 Aug 6;3:e56932. doi: 10.2196/56932.
5
Immunogenicity and seroefficacy of pneumococcal conjugate vaccines: a systematic review and network meta-analysis.肺炎球菌结合疫苗的免疫原性和血清效力:系统评价和网络荟萃分析。
Health Technol Assess. 2024 Jul;28(34):1-109. doi: 10.3310/YWHA3079.
6
Unveiling differential adverse event profiles in vaccines via LLM text embeddings and ontology semantic analysis.通过大语言模型文本嵌入和本体语义分析揭示疫苗中不同的不良事件特征。
J Biomed Semantics. 2025 May 23;16(1):10. doi: 10.1186/s13326-025-00331-8.
7
Short-Term Memory Impairment短期记忆障碍
8
Ontology enrichment using a large language model: Applying lexical, semantic, and knowledge network-based similarity for concept placement.使用大语言模型进行本体丰富:将基于词汇、语义和知识网络的相似性应用于概念放置。
J Biomed Inform. 2025 Aug;168:104865. doi: 10.1016/j.jbi.2025.104865. Epub 2025 Jun 19.
9
Empowering standardization of cancer vaccines through ontology: enhanced modeling and data analysis.通过本体论实现癌症疫苗标准化:增强建模和数据分析。
J Biomed Semantics. 2024 Jun 19;15(1):12. doi: 10.1186/s13326-024-00312-3.
10
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病:网络荟萃分析。
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

引用本文的文献

1
VO: The Vaccine Ontology.VO:疫苗本体论。
bioRxiv. 2025 Aug 15:2025.08.12.669998. doi: 10.1101/2025.08.12.669998.
2
Real-world pharmacovigilance reports of hepatitis A inactivated and hepatitis B (recombinant) vaccine: insights from disproportionality analysis of the vaccine adverse event reporting system.甲型肝炎灭活疫苗和乙型肝炎(重组)疫苗的真实世界药物警戒报告:来自疫苗不良事件报告系统不成比例分析的见解
Front Cell Infect Microbiol. 2025 Jun 10;15:1609409. doi: 10.3389/fcimb.2025.1609409. eCollection 2025.
3
Enhancing Relation Extraction for COVID-19 Vaccine Shot-Adverse Event Associations with Large Language Models.

本文引用的文献

1
Adverse Events of COVID-19 Vaccines in the United States: Temporal and Spatial Analysis.美国 COVID-19 疫苗的不良事件:时空分析。
JMIR Public Health Surveill. 2024 Jul 15;10:e51007. doi: 10.2196/51007.
2
RefAI: a GPT-powered retrieval-augmented generative tool for biomedical literature recommendation and summarization.RefAI:一个基于 GPT 的检索增强型生成工具,用于生物医学文献推荐和总结。
J Am Med Inform Assoc. 2024 Sep 1;31(9):2030-2039. doi: 10.1093/jamia/ocae129.
3
Prompt Tuning in Biomedical Relation Extraction.生物医学关系抽取中的提示调优
利用大语言模型增强新冠疫苗接种与不良事件关联的关系抽取
Res Sq. 2025 Mar 17:rs.3.rs-6201919. doi: 10.21203/rs.3.rs-6201919/v1.
4
Exploring Associations Between COVID-19 Bivalent Vaccines and Their Related Adverse Events: A Correlational Study.探索新冠病毒二价疫苗与其相关不良事件之间的关联:一项相关性研究。
Res Sq. 2025 Mar 10:rs.3.rs-6152825. doi: 10.21203/rs.3.rs-6152825/v1.
5
Exploring Temporal and Spatial Characteristics of Serious Adverse Event Reports Following COVID-19 Bivalent Vaccines.探索新冠病毒二价疫苗接种后严重不良事件报告的时空特征。
Res Sq. 2025 Mar 3:rs.3.rs-6096098. doi: 10.21203/rs.3.rs-6096098/v1.
6
VaxBot-HPV: a GPT-based chatbot for answering HPV vaccine-related questions.VaxBot-HPV:一款基于GPT的聊天机器人,用于回答与HPV疫苗相关的问题。
JAMIA Open. 2025 Feb 19;8(1):ooaf005. doi: 10.1093/jamiaopen/ooaf005. eCollection 2025 Feb.
7
VaxBot-HPV: A GPT-based Chatbot for Answering HPV Vaccine-related Questions.VaxBot-HPV:一款基于GPT的用于回答HPV疫苗相关问题的聊天机器人。
Res Sq. 2024 Sep 11:rs.3.rs-4876692. doi: 10.21203/rs.3.rs-4876692/v1.
8
Relation extraction using large language models: a case study on acupuncture point locations.基于大语言模型的关系抽取研究:以穴位定位为例。
J Am Med Inform Assoc. 2024 Nov 1;31(11):2622-2631. doi: 10.1093/jamia/ocae233.
J Healthc Inform Res. 2024 Feb 29;8(2):206-224. doi: 10.1007/s41666-024-00162-9. eCollection 2024 Jun.
4
AE-GPT: Using Large Language Models to extract adverse events from surveillance reports-A use case with influenza vaccine adverse events.AE-GPT:利用大语言模型从监测报告中提取不良事件——以流感疫苗不良事件为例。
PLoS One. 2024 Mar 21;19(3):e0300919. doi: 10.1371/journal.pone.0300919. eCollection 2024.
5
Artificial intelligence-powered pharmacovigilance: A review of machine and deep learning in clinical text-based adverse drug event detection for benchmark datasets.人工智能驱动的药物警戒:基于机器学习和深度学习的临床文本药物不良事件检测基准数据集综述。
J Biomed Inform. 2024 Apr;152:104621. doi: 10.1016/j.jbi.2024.104621. Epub 2024 Mar 5.
6
Unpacking adverse events and associations post COVID-19 vaccination: a deep dive into vaccine adverse event reporting system data.解析 COVID-19 疫苗接种后不良事件和关联:深入挖掘疫苗不良事件报告系统数据。
Expert Rev Vaccines. 2024 Jan-Dec;23(1):53-59. doi: 10.1080/14760584.2023.2292203. Epub 2023 Dec 14.
7
Towards quality improvement of vaccine concept mappings in the OMOP vocabulary with a semi-automated method.采用半自动方法提高 OMOP 词汇表中疫苗概念图的质量。
J Biomed Inform. 2022 Oct;134:104162. doi: 10.1016/j.jbi.2022.104162. Epub 2022 Aug 25.
8
A simple neural vector space model for medical concept normalization using concept embeddings.使用概念嵌入的医学概念规范化的简单神经向量空间模型。
J Biomed Inform. 2022 Jun;130:104080. doi: 10.1016/j.jbi.2022.104080. Epub 2022 Apr 23.
9
Medical concept normalization in clinical trials with drug and disease representation learning.临床试验中基于药物和疾病表示学习的医学概念规范化。
Bioinformatics. 2021 Nov 5;37(21):3856-3864. doi: 10.1093/bioinformatics/btab474.
10
Clinical concept normalization with a hybrid natural language processing system combining multilevel matching and machine learning ranking.临床概念规范化的混合自然语言处理系统,结合多层次匹配和机器学习排序。
J Am Med Inform Assoc. 2020 Oct 1;27(10):1576-1584. doi: 10.1093/jamia/ocaa155.