• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

临床试验中基于药物和疾病表示学习的医学概念规范化。

Medical concept normalization in clinical trials with drug and disease representation learning.

机构信息

R&D department, Insilico Medicine Hong Kong, 999077 Pak Shek Kok, Hong Kong.

出版信息

Bioinformatics. 2021 Nov 5;37(21):3856-3864. doi: 10.1093/bioinformatics/btab474.

DOI:10.1093/bioinformatics/btab474
PMID:34213526
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8570806/
Abstract

MOTIVATION

Clinical trials are the essential stage of every drug development program for the treatment to become available to patients. Despite the importance of well-structured clinical trial databases and their tremendous value for drug discovery and development such instances are very rare. Presently large-scale information on clinical trials is stored in clinical trial registers which are relatively structured, but the mappings to external databases of drugs and diseases are increasingly lacking. The precise production of such links would enable us to interrogate richer harmonized datasets for invaluable insights.

RESULTS

We present a neural approach for medical concept normalization of diseases and drugs. Our two-stage approach is based on Bidirectional Encoder Representations from Transformers (BERT). In the training stage, we optimize the relative similarity of mentions and concept names from a terminology via triplet loss. In the inference stage, we obtain the closest concept name representation in a common embedding space to a given mention representation. We performed a set of experiments on a dataset of abstracts and a real-world dataset of trial records with interventions and conditions mapped to drug and disease terminologies. The latter includes mentions associated with one or more concepts (in-KB) or zero (out-of-KB, nil prediction). Experiments show that our approach significantly outperforms baseline and state-of-the-art architectures. Moreover, we demonstrate that our approach is effective in knowledge transfer from the scientific literature to clinical trial data.

AVAILABILITY AND IMPLEMENTATION

We make code and data freely available at https://github.com/insilicomedicine/DILBERT.

摘要

动机

临床试验是每个治疗药物开发计划的重要阶段,使治疗方法能够为患者所用。尽管有结构良好的临床试验数据库非常重要,而且对药物发现和开发具有巨大价值,但这种情况非常罕见。目前,大规模的临床试验信息存储在临床试验登记处,这些登记处相对结构化,但与药物和疾病的外部数据库的映射越来越缺乏。这些链接的精确生成将使我们能够查询更丰富的协调数据集,以获得宝贵的见解。

结果

我们提出了一种用于疾病和药物的医学概念规范化的神经方法。我们的两阶段方法基于来自 Transformer 的双向编码器表示(BERT)。在训练阶段,我们通过三元组损失优化了术语中提及和概念名称的相对相似性。在推断阶段,我们在一个常见的嵌入空间中获得给定提及表示的最接近的概念名称表示。我们在一个摘要数据集和一个包含干预措施和条件的真实试验记录数据集上进行了一系列实验,这些数据集映射到药物和疾病术语。后者包括与一个或多个概念相关的提及(在 KB 中)或零(在 KB 之外,无预测)。实验表明,我们的方法明显优于基线和最先进的架构。此外,我们证明了我们的方法在从科学文献到临床试验数据的知识转移方面是有效的。

可用性和实现

我们在 https://github.com/insilicomedicine/DILBERT 上免费提供代码和数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7459/8570806/4153637724ff/btab474f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7459/8570806/4153637724ff/btab474f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7459/8570806/4153637724ff/btab474f1.jpg

相似文献

1
Medical concept normalization in clinical trials with drug and disease representation learning.临床试验中基于药物和疾病表示学习的医学概念规范化。
Bioinformatics. 2021 Nov 5;37(21):3856-3864. doi: 10.1093/bioinformatics/btab474.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
A metric learning-based method for biomedical entity linking.一种基于度量学习的生物医学实体链接方法。
Front Res Metr Anal. 2023 Dec 19;8:1247094. doi: 10.3389/frma.2023.1247094. eCollection 2023.
4
The future of Cochrane Neonatal.考克兰新生儿协作网的未来。
Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.
5
NCBI disease corpus: a resource for disease name recognition and concept normalization.NCBI疾病语料库:一种用于疾病名称识别和概念规范化的资源。
J Biomed Inform. 2014 Feb;47:1-10. doi: 10.1016/j.jbi.2013.12.006. Epub 2014 Jan 3.
6
BioConceptVec: Creating and evaluating literature-based biomedical concept embeddings on a large scale.生物概念向量:在大规模上创建和评估基于文献的生物医学概念嵌入。
PLoS Comput Biol. 2020 Apr 23;16(4):e1007617. doi: 10.1371/journal.pcbi.1007617. eCollection 2020 Apr.
7
Adapting Bidirectional Encoder Representations from Transformers (BERT) to Assess Clinical Semantic Textual Similarity: Algorithm Development and Validation Study.改编来自Transformer的双向编码器表征(BERT)以评估临床语义文本相似性:算法开发与验证研究。
JMIR Med Inform. 2021 Feb 3;9(2):e22795. doi: 10.2196/22795.
8
iDrug: Integration of drug repositioning and drug-target prediction via cross-network embedding.iDrug:通过跨网络嵌入实现药物重定位和药物靶点预测的整合。
PLoS Comput Biol. 2020 Jul 15;16(7):e1008040. doi: 10.1371/journal.pcbi.1008040. eCollection 2020 Jul.
9
An annotated dataset for extracting gene-melanoma relations from scientific literature.从科学文献中提取基因-黑色素瘤关系的带注释数据集。
J Biomed Semantics. 2022 Jan 19;13(1):2. doi: 10.1186/s13326-021-00251-3.
10
Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing.利用基于深度学习的自然语言处理技术从非结构化电子健康记录中分类社会健康决定因素。
J Biomed Inform. 2022 Mar;127:103984. doi: 10.1016/j.jbi.2021.103984. Epub 2022 Jan 7.

引用本文的文献

1
A refined set of RxNorm drug names for enhancing unstructured data analysis in drug safety surveillance.一组经过优化的RxNorm药物名称,用于加强药物安全监测中的非结构化数据分析。
Exp Biol Med (Maywood). 2025 May 2;250:10374. doi: 10.3389/ebm.2025.10374. eCollection 2025.
2
Unsupervised SapBERT-based bi-encoders for medical concept annotation of clinical narratives with SNOMED CT.基于无监督SapBERT的双编码器,用于使用SNOMED CT对临床叙述进行医学概念注释。
Digit Health. 2024 Oct 21;10:20552076241288681. doi: 10.1177/20552076241288681. eCollection 2024 Jan-Dec.
3
Mapping vaccine names in clinical trials to vaccine ontology using cascaded fine-tuned domain-specific language models.

本文引用的文献

1
BERT-based Ranking for Biomedical Entity Normalization.基于BERT的生物医学实体规范化排序
AMIA Jt Summits Transl Sci Proc. 2020 May 30;2020:269-277. eCollection 2020.
2
Fine-Tuning Bidirectional Encoder Representations From Transformers (BERT)-Based Models on Large-Scale Electronic Health Record Notes: An Empirical Study.基于大规模电子健康记录笔记对基于变换器的双向编码器表征(BERT)模型进行微调:一项实证研究。
JMIR Med Inform. 2019 Sep 12;7(3):e14830. doi: 10.2196/14830.
3
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.
使用级联微调的领域特定语言模型将临床试验中的疫苗名称映射到疫苗本体。
J Biomed Semantics. 2024 Aug 10;15(1):14. doi: 10.1186/s13326-024-00318-x.
4
nach0: multimodal natural and chemical languages foundation model.Nach0:多模态自然与化学语言基础模型。
Chem Sci. 2024 May 8;15(22):8380-8389. doi: 10.1039/d4sc00966e. eCollection 2024 Jun 5.
5
A Dataset for Evaluating Contextualized Representation of Biomedical Concepts in Language Models.用于评估语言模型中生物医学概念语境化表示的数据集。
Sci Data. 2024 May 4;11(1):455. doi: 10.1038/s41597-024-03317-w.
6
Mapping Vaccine Names in Clinical Trials to Vaccine Ontology using Cascaded Fine-Tuned Domain-Specific Language Models.使用级联微调的特定领域语言模型将临床试验中的疫苗名称映射到疫苗本体。
Res Sq. 2023 Sep 27:rs.3.rs-3362256. doi: 10.21203/rs.3.rs-3362256/v1.
7
Mining of EHR for interface terminology concepts for annotating EHRs of COVID patients.挖掘电子健康记录中的接口术语概念,用于注释新冠患者的电子健康记录。
BMC Med Inform Decis Mak. 2023 Feb 24;23(Suppl 1):40. doi: 10.1186/s12911-023-02136-0.
8
An overview of biomedical entity linking throughout the years.生物医学实体链接概述。
J Biomed Inform. 2023 Jan;137:104252. doi: 10.1016/j.jbi.2022.104252. Epub 2022 Dec 2.
9
Combining human and machine intelligence for clinical trial eligibility querying.结合人类与机器智能进行临床试验资格查询。
J Am Med Inform Assoc. 2022 Jun 14;29(7):1161-1171. doi: 10.1093/jamia/ocac051.
BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
4
Trends in clinical success rates and therapeutic focus.临床成功率及治疗重点的趋势
Nat Rev Drug Discov. 2019 Jul;18(7):495-496. doi: 10.1038/d41573-019-00074-z.
5
Drug prioritization using the semantic properties of a knowledge graph.基于知识图谱的语义特性进行药物优先级排序。
Sci Rep. 2019 Apr 18;9(1):6281. doi: 10.1038/s41598-019-42806-6.
6
The Comparative Toxicogenomics Database: update 2019.比较毒理学基因组学数据库:2019 年更新。
Nucleic Acids Res. 2019 Jan 8;47(D1):D948-D954. doi: 10.1093/nar/gky868.
7
Medical concept normalization in social media posts with recurrent neural networks.社交媒体帖子中的医学概念规范化:基于递归神经网络的方法
J Biomed Inform. 2018 Aug;84:93-102. doi: 10.1016/j.jbi.2018.06.006. Epub 2018 Jun 12.
8
Estimation of clinical trial success rates and related parameters.临床试验成功率及相关参数的估计。
Biostatistics. 2019 Apr 1;20(2):273-286. doi: 10.1093/biostatistics/kxx069.
9
DrugBank 5.0: a major update to the DrugBank database for 2018.DrugBank 5.0:2018 年 DrugBank 数据库的重大更新。
Nucleic Acids Res. 2018 Jan 4;46(D1):D1074-D1082. doi: 10.1093/nar/gkx1037.
10
The representativeness of eligible patients in type 2 diabetes trials: a case study using GIST 2.0.2型糖尿病试验中符合条件患者的代表性:一项使用GIST 2.0的案例研究
J Am Med Inform Assoc. 2018 Mar 1;25(3):239-247. doi: 10.1093/jamia/ocx091.