• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用级联微调的特定领域语言模型将临床试验中的疫苗名称映射到疫苗本体。

Mapping Vaccine Names in Clinical Trials to Vaccine Ontology using Cascaded Fine-Tuned Domain-Specific Language Models.

作者信息

Li Jianfu, Li Yiming, Pan Yuanyi, Guo Jinjing, Sun Zenan, Li Fang, He Yongqun, Tao Cui

机构信息

The University of Texas Health Science Center at Houston.

University of Michigan Medical School.

出版信息

Res Sq. 2023 Sep 27:rs.3.rs-3362256. doi: 10.21203/rs.3.rs-3362256/v1.

DOI:10.21203/rs.3.rs-3362256/v1
PMID:37841880
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10571639/
Abstract

BACKGROUND

Vaccines have revolutionized public health by providing protection against infectious diseases. They stimulate the immune system and generate memory cells to defend against targeted diseases. Clinical trials evaluate vaccine performance, including dosage, administration routes, and potential side effects. ClinicalTrials.gov is a valuable repository of clinical trial information, but the vaccine data in them lacks standardization, leading to challenges in automatic concept mapping, vaccine-related knowledge development, evidence-based decision-making, and vaccine surveillance.

RESULTS

In this study, we developed a cascaded framework that capitalized on multiple domain knowledge sources, including clinical trials, Unified Medical Language System (UMLS), and the Vaccine Ontology (VO), to enhance the performance of domain-specific language models for automated mapping of VO from clinical trials. The Vaccine Ontology (VO) is a community-based ontology that was developed to promote vaccine data standardization, integration, and computer-assisted reasoning. Our methodology involved extracting and annotating data from various sources. We then performed pre-training on the PubMedBERT model, leading to the development of CTPubMedBERT. Subsequently, we enhanced CTPubMedBERT by incorporating SAPBERT, which was pretrained using the UMLS, resulting in CTPubMedBERT + SAPBERT. Further refinement was accomplished through fine-tuning using the Vaccine Ontology corpus and vaccine data from clinical trials, yielding the CTPubMedBERT + SAPBERT + VO model. Finally, we utilized a collection of pre-trained models, along with the weighted rule-based ensemble approach, to normalize the vaccine corpus and improve the accuracy of the process. The ranking process in concept normalization involves prioritizing and ordering potential concepts to identify the most suitable match for a given context. We conducted a ranking of the Top 10 concepts, and our experimental results demonstrate that our proposed cascaded framework consistently outperformed existing effective baselines on vaccine mapping, achieving 71.8% on top 1 candidate's accuracy and 90.0% on top 10 candidate's accuracy.

CONCLUSION

This study provides a detailed insight into a cascaded framework of fine-tuned domain-specific language models improving mapping of VO from clinical trials. By effectively leveraging domain-specific information and applying weighted rule-based ensembles of different pre-trained BERT models, our framework can significantly enhance the mapping of VO from clinical trials.

摘要

背景

疫苗通过提供针对传染病的保护,彻底改变了公共卫生状况。它们刺激免疫系统并产生记忆细胞,以抵御特定疾病。临床试验评估疫苗性能,包括剂量、给药途径和潜在副作用。ClinicalTrials.gov是临床试验信息的宝贵存储库,但其中的疫苗数据缺乏标准化,导致在自动概念映射、疫苗相关知识开发、循证决策和疫苗监测方面存在挑战。

结果

在本研究中,我们开发了一个级联框架,该框架利用了多个领域知识源,包括临床试验、统一医学语言系统(UMLS)和疫苗本体(VO),以提高特定领域语言模型从临床试验中自动映射VO的性能。疫苗本体(VO)是一个基于社区的本体,旨在促进疫苗数据的标准化、整合和计算机辅助推理。我们的方法包括从各种来源提取和注释数据。然后,我们在PubMedBERT模型上进行预训练,开发出CTPubMedBERT。随后,我们通过合并使用UMLS进行预训练的SAPBERT来增强CTPubMedBERT,得到CTPubMedBERT + SAPBERT。通过使用疫苗本体语料库和来自临床试验的疫苗数据进行微调,进一步优化,得到CTPubMedBERT + SAPBERT + VO模型。最后,我们利用一组预训练模型以及基于加权规则的集成方法,对疫苗语料库进行标准化并提高该过程的准确性。概念标准化中的排序过程涉及对潜在概念进行优先级排序和排序,以确定给定上下文中最合适的匹配。我们对前10个概念进行了排序,实验结果表明,我们提出的级联框架在疫苗映射方面始终优于现有的有效基线,在顶级1候选准确率上达到71.8%,在顶级10候选准确率上达到90.0%。

结论

本研究详细介绍了一个微调特定领域语言模型的级联框架,该框架改进了从临床试验中映射VO的过程。通过有效利用特定领域信息并应用不同预训练BERT模型的基于加权规则的集成,我们的框架可以显著增强从临床试验中映射VO的能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f22/10571639/12969e1f6820/nihpp-rs3362256v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f22/10571639/65172cab0574/nihpp-rs3362256v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f22/10571639/a3ca0c1d0ea0/nihpp-rs3362256v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f22/10571639/12969e1f6820/nihpp-rs3362256v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f22/10571639/65172cab0574/nihpp-rs3362256v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f22/10571639/a3ca0c1d0ea0/nihpp-rs3362256v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f22/10571639/12969e1f6820/nihpp-rs3362256v1-f0003.jpg

相似文献

1
Mapping Vaccine Names in Clinical Trials to Vaccine Ontology using Cascaded Fine-Tuned Domain-Specific Language Models.使用级联微调的特定领域语言模型将临床试验中的疫苗名称映射到疫苗本体。
Res Sq. 2023 Sep 27:rs.3.rs-3362256. doi: 10.21203/rs.3.rs-3362256/v1.
2
Mapping vaccine names in clinical trials to vaccine ontology using cascaded fine-tuned domain-specific language models.使用级联微调的领域特定语言模型将临床试验中的疫苗名称映射到疫苗本体。
J Biomed Semantics. 2024 Aug 10;15(1):14. doi: 10.1186/s13326-024-00318-x.
3
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
4
Enhancing Clinical Relevance of Pretrained Language Models Through Integration of External Knowledge: Case Study on Cardiovascular Diagnosis From Electronic Health Records.通过整合外部知识提高预训练语言模型的临床相关性:来自电子健康记录的心血管诊断案例研究
JMIR AI. 2024 Aug 6;3:e56932. doi: 10.2196/56932.
5
Immunogenicity and seroefficacy of pneumococcal conjugate vaccines: a systematic review and network meta-analysis.肺炎球菌结合疫苗的免疫原性和血清效力:系统评价和网络荟萃分析。
Health Technol Assess. 2024 Jul;28(34):1-109. doi: 10.3310/YWHA3079.
6
Short-Term Memory Impairment短期记忆障碍
7
Audit and feedback: effects on professional practice.审核与反馈:对专业实践的影响
Cochrane Database Syst Rev. 2025 Mar 25;3(3):CD000259. doi: 10.1002/14651858.CD000259.pub4.
8
A Weighted Voting Approach for Traditional Chinese Medicine Formula Classification Using Large Language Models: Algorithm Development and Validation Study.一种使用大语言模型的中医方剂分类加权投票方法:算法开发与验证研究
JMIR Med Inform. 2025 Jul 24;13:e69286. doi: 10.2196/69286.
9
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病:网络荟萃分析。
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.
10
Factors that influence caregivers' and adolescents' views and practices regarding human papillomavirus (HPV) vaccination for adolescents: a qualitative evidence synthesis.影响照顾者和青少年对青少年人乳头瘤病毒(HPV)疫苗接种的看法及做法的因素:一项定性证据综合分析
Cochrane Database Syst Rev. 2025 Apr 15;4(4):CD013430. doi: 10.1002/14651858.CD013430.pub2.

本文引用的文献

1
Towards quality improvement of vaccine concept mappings in the OMOP vocabulary with a semi-automated method.采用半自动方法提高 OMOP 词汇表中疫苗概念图的质量。
J Biomed Inform. 2022 Oct;134:104162. doi: 10.1016/j.jbi.2022.104162. Epub 2022 Aug 25.
2
A simple neural vector space model for medical concept normalization using concept embeddings.使用概念嵌入的医学概念规范化的简单神经向量空间模型。
J Biomed Inform. 2022 Jun;130:104080. doi: 10.1016/j.jbi.2022.104080. Epub 2022 Apr 23.
3
Medical concept normalization in clinical trials with drug and disease representation learning.
临床试验中基于药物和疾病表示学习的医学概念规范化。
Bioinformatics. 2021 Nov 5;37(21):3856-3864. doi: 10.1093/bioinformatics/btab474.
4
Clinical concept normalization with a hybrid natural language processing system combining multilevel matching and machine learning ranking.临床概念规范化的混合自然语言处理系统,结合多层次匹配和机器学习排序。
J Am Med Inform Assoc. 2020 Oct 1;27(10):1576-1584. doi: 10.1093/jamia/ocaa155.
5
UMLS users and uses: a current overview.《统一医学语言系统》的用户与用途:当前概述
J Am Med Inform Assoc. 2020 Jul 19;27(10):1606-11. doi: 10.1093/jamia/ocaa084.
6
Key steps in vaccine development.疫苗开发的关键步骤。
Ann Allergy Asthma Immunol. 2020 Jul;125(1):17-27. doi: 10.1016/j.anai.2020.01.025. Epub 2020 Feb 7.
7
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
8
Estimation of clinical trial success rates and related parameters.临床试验成功率及相关参数的估计。
Biostatistics. 2019 Apr 1;20(2):273-286. doi: 10.1093/biostatistics/kxx069.
9
Understanding modern-day vaccines: what you need to know.了解现代疫苗:你需要知道的。
Ann Med. 2018 Mar;50(2):110-120. doi: 10.1080/07853890.2017.1407035. Epub 2017 Nov 27.
10
Vaccine Hesitancy: Where We Are and Where We Are Going.疫苗犹豫:我们所处的位置与前进的方向
Clin Ther. 2017 Aug;39(8):1550-1562. doi: 10.1016/j.clinthera.2017.07.003. Epub 2017 Jul 31.