• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

知识注入的跨语言医学术语嵌入用于术语归一化。

CODER: Knowledge-infused cross-lingual medical term embedding for term normalization.

机构信息

Center for Statistical Science, Tsinghua University, Beijing, China; Department of Industrial Engineering, Tsinghua University, Beijing, China.

Institute of Medical Information, Chinese Academy of Medical Sciences/Peking Union Medical College, Beijing, China.

出版信息

J Biomed Inform. 2022 Feb;126:103983. doi: 10.1016/j.jbi.2021.103983. Epub 2022 Jan 4.

DOI:10.1016/j.jbi.2021.103983
PMID:34990838
Abstract

OBJECTIVE

This paper aims to propose knowledge-aware embedding, a critical tool for medical term normalization.

METHODS

We develop CODER (Cross-lingual knowledge-infused medical term embedding) via contrastive learning based on a medical knowledge graph (KG) named the Unified Medical Language System, and similarities are calculated utilizing both terms and relation triplets from the KG. Training with relations injects medical knowledge into embeddings and can potentially improve their performance as machine learning features.

RESULTS

We evaluate CODER based on zero-shot term normalization, semantic similarity, and relation classification benchmarks, and the results show that CODER outperforms various state-of-the-art biomedical word embeddings, concept embeddings, and contextual embeddings.

CONCLUSION

CODER embeddings excellently reflect semantic similarity and relatedness of medical concepts. One can use CODER for embedding-based medical term normalization or to provide features for machine learning. Similar to other pretrained language models, CODER can also be fine-tuned for specific tasks. Codes and models are available at https://github.com/GanjinZero/CODER.

摘要

目的

本文旨在提出知识感知嵌入,这是医学术语规范化的重要工具。

方法

我们通过基于医学知识图谱(名为统一医学语言系统的 KG)的对比学习来开发 CODER(跨语言知识注入的医学术语嵌入),并利用 KG 中的术语和关系三元组来计算相似度。利用关系进行训练将医学知识注入到嵌入中,从而有可能提高它们作为机器学习特征的性能。

结果

我们基于零镜头术语规范化、语义相似性和关系分类基准来评估 CODER,结果表明 CODER 优于各种最先进的生物医学词嵌入、概念嵌入和上下文嵌入。

结论

CODER 嵌入极好地反映了医学概念的语义相似性和相关性。可以将 CODER 用于基于嵌入的医学术语规范化,或为机器学习提供特征。与其他预训练的语言模型类似,CODER 也可以针对特定任务进行微调。代码和模型可在 https://github.com/GanjinZero/CODER 上获得。

相似文献

1
CODER: Knowledge-infused cross-lingual medical term embedding for term normalization.知识注入的跨语言医学术语嵌入用于术语归一化。
J Biomed Inform. 2022 Feb;126:103983. doi: 10.1016/j.jbi.2021.103983. Epub 2022 Jan 4.
2
Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts.使用词和图嵌入来衡量统一医学语言系统概念之间的语义相关性。
J Am Med Inform Assoc. 2020 Oct 1;27(10):1538-1546. doi: 10.1093/jamia/ocaa136.
3
A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。
J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.
4
Improved biomedical word embeddings in the transformer era.Transformer 时代改进的生物医学词向量。
J Biomed Inform. 2021 Aug;120:103867. doi: 10.1016/j.jbi.2021.103867. Epub 2021 Jul 18.
5
Evaluating semantic relations in neural word embeddings with biomedical and general domain knowledge bases.利用生物医学和一般领域知识库评估神经词汇嵌入中的语义关系。
BMC Med Inform Decis Mak. 2018 Jul 23;18(Suppl 2):65. doi: 10.1186/s12911-018-0630-x.
6
Multi-Ontology Refined Embeddings (MORE): A hybrid multi-ontology and corpus-based semantic representation model for biomedical concepts.多本体精炼嵌入模型(MORE):一种基于混合多本体和语料库的生物医学概念语义表示模型。
J Biomed Inform. 2020 Nov;111:103581. doi: 10.1016/j.jbi.2020.103581. Epub 2020 Oct 1.
7
Medical concept normalization in French using multilingual terminologies and contextual embeddings.使用多语言术语和上下文嵌入进行法语医学概念规范化。
J Biomed Inform. 2021 Feb;114:103684. doi: 10.1016/j.jbi.2021.103684. Epub 2021 Jan 12.
8
Summarization of biomedical articles using domain-specific word embeddings and graph ranking.基于领域特定词嵌入和图排序的生物医学文章摘要。
J Biomed Inform. 2020 Jul;107:103452. doi: 10.1016/j.jbi.2020.103452. Epub 2020 May 19.
9
BioLORD-2023: semantic textual representations fusing large language models and clinical knowledge graph insights.BioLORD-2023:融合大型语言模型和临床知识图谱洞察的语义文本表示。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1844-1855. doi: 10.1093/jamia/ocae029.
10
Fine-Tuning Word Embeddings for Hierarchical Representation of Data Using a Corpus and a Knowledge Base for Various Machine Learning Applications.使用语料库和知识库对数据进行层次表示的词向量微调,用于各种机器学习应用。
Comput Math Methods Med. 2021 Nov 16;2021:9761163. doi: 10.1155/2021/9761163. eCollection 2021.

引用本文的文献

1
Advancing the Use of Longitudinal Electronic Health Records: Tutorial for Uncovering Real-World Evidence in Chronic Disease Outcomes.推进纵向电子健康记录的应用:慢性病结局中发现真实世界证据的教程。
J Med Internet Res. 2025 May 12;27:e71873. doi: 10.2196/71873.
2
Improving Phenotyping of Patients With Immune-Mediated Inflammatory Diseases Through Automated Processing of Discharge Summaries: Multicenter Cohort Study.通过出院小结自动处理改善免疫介导性炎症疾病患者的表型分析:多中心队列研究
JMIR Med Inform. 2025 Apr 9;13:e68704. doi: 10.2196/68704.
3
ARCH: Large-scale knowledge graph via aggregated narrative codified health records analysis.
ARCH:通过汇总叙述性编码健康记录分析构建大规模知识图谱
J Biomed Inform. 2025 Feb;162:104761. doi: 10.1016/j.jbi.2024.104761. Epub 2025 Jan 23.
4
Robust Automated Harmonization of Heterogeneous Data Through Ensemble Machine Learning: Algorithm Development and Validation Study.通过集成机器学习对异构数据进行稳健的自动协调:算法开发与验证研究
JMIR Med Inform. 2025 Jan 22;13:e54133. doi: 10.2196/54133.
5
Heterogeneous entity representation for medicinal synergy prediction.用于药物协同作用预测的异构实体表示
Bioinformatics. 2024 Dec 26;41(1). doi: 10.1093/bioinformatics/btae750.
6
DOME: Directional medical embedding vectors from Electronic Health Records.DOME:来自电子健康记录的定向医学嵌入向量。
J Biomed Inform. 2025 Feb;162:104768. doi: 10.1016/j.jbi.2024.104768. Epub 2025 Jan 2.
7
xMEN: a modular toolkit for cross-lingual medical entity normalization.xMEN:用于跨语言医学实体规范化的模块化工具包。
JAMIA Open. 2024 Dec 26;8(1):ooae147. doi: 10.1093/jamiaopen/ooae147. eCollection 2025 Feb.
8
Multisource representation learning for pediatric knowledge extraction from electronic health records.用于从电子健康记录中提取儿科知识的多源表示学习
NPJ Digit Med. 2024 Nov 13;7(1):319. doi: 10.1038/s41746-024-01320-4.
9
Use of SNOMED CT in Large Language Models: Scoping Review.SNOMED CT 在大语言模型中的应用:范围综述。
JMIR Med Inform. 2024 Oct 7;12:e62924. doi: 10.2196/62924.
10
Retrieval-Based Diagnostic Decision Support: Mixed Methods Study.基于检索的诊断决策支持:混合方法研究。
JMIR Med Inform. 2024 Jun 19;12:e50209. doi: 10.2196/50209.