• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

将癌症临床试验与其结果出版物相联系。

Linking Cancer Clinical Trials to their Result Publications.

作者信息

Pan Evan, Roberts Kirk

机构信息

Department of Computer Science & Engineering, Texas A&M University, College Station, TX, USA.

School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA.

出版信息

AMIA Jt Summits Transl Sci Proc. 2024 May 31;2024:642-651. eCollection 2024.

PMID:38827077
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11141816/
Abstract

The results of clinical trials are a valuable source of evidence for researchers, policy makers, and healthcare professionals. However, online trial registries do not always contain links to the publications that report on their results, instead requiring a time-consuming manual search. Here, we explored the application of pre-trained transformer-based language models to automatically identify result-reporting publications of cancer clinical trials by computing dense vectors and performing semantic search. Models were fine-tuned on text data from trial registry fields and article metadata using a contrastive learning approach. The best performing model was PubMedBERT, which achieved a mean average precision of 0.592 and ranked 70.3% of a trial's publications in the top 5 results when tested on the holdout test trials. Our results suggest that semantic search using embeddings from transformer models may be an effective approach to the task of linking trials to their publications.

摘要

临床试验结果是研究人员、政策制定者和医疗保健专业人员的重要证据来源。然而,在线试验注册库并不总是包含指向报告其结果的出版物的链接,而是需要耗时的手动搜索。在此,我们探索了基于预训练变压器的语言模型的应用,通过计算密集向量和执行语义搜索来自动识别癌症临床试验的结果报告出版物。使用对比学习方法在试验注册库字段和文章元数据的文本数据上对模型进行微调。表现最佳的模型是PubMedBERT,在保留测试试验上进行测试时,其平均平均精度达到0.592,并且在试验的出版物中,有70.3%的出版物在前5个结果中排名。我们的结果表明,使用来自变压器模型的嵌入进行语义搜索可能是将试验与其出版物链接起来的任务的有效方法。

相似文献

1
Linking Cancer Clinical Trials to their Result Publications.将癌症临床试验与其结果出版物相联系。
AMIA Jt Summits Transl Sci Proc. 2024 May 31;2024:642-651. eCollection 2024.
2
Results publications are inadequately linked to trial registrations: An automated pipeline and evaluation of German university medical centers.研究结果出版物与试验注册的链接不充分:德国大学医学中心的自动化管道和评估。
Clin Trials. 2022 Jun;19(3):337-346. doi: 10.1177/17407745221087456. Epub 2022 Apr 1.
3
A web-based tool for automatically linking clinical trials to their publications.一个用于自动将临床试验与其出版物进行链接的网络工具。
J Am Med Inform Assoc. 2022 Apr 13;29(5):822-830. doi: 10.1093/jamia/ocab290.
4
Automatic categorization of self-acknowledged limitations in randomized controlled trial publications.自我承认的随机对照试验出版物局限性的自动分类。
J Biomed Inform. 2024 Apr;152:104628. doi: 10.1016/j.jbi.2024.104628. Epub 2024 Mar 26.
5
Predicting Semantic Similarity Between Clinical Sentence Pairs Using Transformer Models: Evaluation and Representational Analysis.使用Transformer模型预测临床句子对之间的语义相似性:评估与表征分析
JMIR Med Inform. 2021 May 26;9(5):e23099. doi: 10.2196/23099.
6
A comparative study of pre-trained language models for named entity recognition in clinical trial eligibility criteria from multiple corpora.基于多语料库的临床试验资格标准中命名实体识别的预训练语言模型的比较研究。
BMC Med Inform Decis Mak. 2022 Sep 6;22(Suppl 3):235. doi: 10.1186/s12911-022-01967-7.
7
Text classification models for assessing the completeness of randomized controlled trial publications based on CONSORT reporting guidelines.基于 CONSORT 报告规范的评估随机对照试验出版物完整性的文本分类模型。
Sci Rep. 2024 Sep 17;14(1):21721. doi: 10.1038/s41598-024-72130-7.
8
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
9
Identification of Semantically Similar Sentences in Clinical Notes: Iterative Intermediate Training Using Multi-Task Learning.临床笔记中语义相似句子的识别:使用多任务学习的迭代中间训练
JMIR Med Inform. 2020 Nov 27;8(11):e22508. doi: 10.2196/22508.
10
CONSORT-TM: Text classification models for assessing the completeness of randomized controlled trial publications.CONSORT-TM:用于评估随机对照试验出版物完整性的文本分类模型。
medRxiv. 2024 Apr 1:2024.03.31.24305138. doi: 10.1101/2024.03.31.24305138.

引用本文的文献

1
Linking Trials to Publications: Enhancing Recall by Identifying Trial Registry Mentions in Full-Text.将试验与出版物相联系:通过在全文中识别试验注册提及内容来提高召回率。
medRxiv. 2025 Jun 10:2025.06.09.25329285. doi: 10.1101/2025.06.09.25329285.
2
Distribution of trial registry numbers within full-text of PubMed Central articles: implications for linking trials to publications and indexing trial publication types.美国国立医学图书馆(NLM)的医学期刊全文数据库(PubMed Central)文章全文中试验注册号的分布:对试验与出版物链接及试验出版物类型索引的影响
Trials. 2025 Jan 31;26(1):34. doi: 10.1186/s13063-025-08741-w.

本文引用的文献

1
A web-based tool for automatically linking clinical trials to their publications.一个用于自动将临床试验与其出版物进行链接的网络工具。
J Am Med Inform Assoc. 2022 Apr 13;29(5):822-830. doi: 10.1093/jamia/ocab290.
2
AMMU: A survey of transformer-based biomedical pretrained language models.基于变压器的生物医学预训练语言模型综述。
J Biomed Inform. 2022 Feb;126:103982. doi: 10.1016/j.jbi.2021.103982. Epub 2021 Dec 31.
3
An analysis of the rates of discontinuation and non-publication of colorectal cancer clinical trials.结直肠癌临床试验停药和不发表率分析。
Int J Colorectal Dis. 2021 Nov;36(11):2529-2532. doi: 10.1007/s00384-021-03972-0. Epub 2021 Jun 10.
4
Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries.《全球癌症统计数据 2020:全球 185 个国家和地区 36 种癌症的发病率和死亡率估计》。
CA Cancer J Clin. 2021 May;71(3):209-249. doi: 10.3322/caac.21660. Epub 2021 Feb 4.
5
Obstacles to the reuse of study metadata in ClinicalTrials.gov.临床实验数据库中研究元数据再利用的障碍。
Sci Data. 2020 Dec 18;7(1):443. doi: 10.1038/s41597-020-00780-z.
6
Rates of Discontinuation and Nonpublication of Head and Neck Cancer Randomized Clinical Trials.头颈部癌症随机临床试验的停药和不发表率。
JAMA Otolaryngol Head Neck Surg. 2020 Feb 1;146(2):176-182. doi: 10.1001/jamaoto.2019.3967.
7
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
8
Some data quality issues at ClinicalTrials.gov.ClinicalTrials.gov 存在一些数据质量问题。
Trials. 2019 Jun 24;20(1):378. doi: 10.1186/s13063-019-3408-2.
9
Nonpublication Rates and Characteristics of Registered Randomized Clinical Trials in Digital Health: Cross-Sectional Analysis.数字健康领域注册随机临床试验的未发表率及特征:横断面分析
J Med Internet Res. 2018 Dec 18;20(12):e11924. doi: 10.2196/11924.
10
Automatically Linking Registered Clinical Trials to their Published Results with Deep Highway Networks.利用深度高速公路网络自动将注册临床试验与其发表的结果相链接。
AMIA Jt Summits Transl Sci Proc. 2018 May 18;2017:54-63. eCollection 2018.