• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

xMEN:用于跨语言医学实体规范化的模块化工具包。

xMEN: a modular toolkit for cross-lingual medical entity normalization.

作者信息

Borchert Florian, Llorca Ignacio, Roller Roland, Arnrich Bert, Schapranow Matthieu-P

机构信息

Hasso Plattner Institute for Digital Engineering, University of Potsdam, Potsdam 14482, Germany.

Speech and Language Technology Lab, German Research Center for Artificial Intelligence (DFKI), Berlin 10559, Germany.

出版信息

JAMIA Open. 2024 Dec 26;8(1):ooae147. doi: 10.1093/jamiaopen/ooae147. eCollection 2025 Feb.

DOI:10.1093/jamiaopen/ooae147
PMID:39735785
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11671143/
Abstract

OBJECTIVE

To improve performance of medical entity normalization across many languages, especially when fewer language resources are available compared to English.

MATERIALS AND METHODS

We propose xMEN, a modular system for cross-lingual (x) medical entity normalization (MEN), accommodating both low- and high-resource scenarios. To account for the scarcity of aliases for many target languages and terminologies, we leverage multilingual aliases via cross-lingual candidate generation. For candidate ranking, we incorporate a trainable cross-encoder (CE) model if annotations for the target task are available. To balance the output of general-purpose candidate generators with subsequent trainable re-rankers, we introduce a novel rank regularization term in the loss function for training CEs. For re-ranking without gold-standard annotations, we introduce multiple new weakly labeled datasets using machine translation and projection of annotations from a high-resource language.

RESULTS

xMEN improves the state-of-the-art performance across various benchmark datasets for several European languages. Weakly supervised CEs are effective when no training data is available for the target task.

DISCUSSION

We perform an analysis of normalization errors, revealing that complex entities are still challenging to normalize. New modules and benchmark datasets can be easily integrated in the future.

CONCLUSION

xMEN exhibits strong performance for medical entity normalization in many languages, even when no labeled data and few terminology aliases for the target language are available. To enable reproducible benchmarks in the future, we make the system available as an open-source Python toolkit. The pre-trained models and source code are available online: https://github.com/hpi-dhc/xmen.

摘要

目的

提高多种语言的医学实体归一化性能,尤其是在与英语相比可用语言资源较少的情况下。

材料与方法

我们提出了xMEN,这是一个用于跨语言(x)医学实体归一化(MEN)的模块化系统,适用于低资源和高资源场景。为了解决许多目标语言和术语别名稀缺的问题,我们通过跨语言候选生成利用多语言别名。对于候选排序,如果有目标任务的注释,我们纳入一个可训练的跨编码器(CE)模型。为了平衡通用候选生成器的输出与后续可训练的重排器,我们在训练CE的损失函数中引入了一个新的排序正则化项。对于没有黄金标准注释的重排,我们使用机器翻译和从高资源语言投影注释来引入多个新的弱标记数据集。

结果

xMEN在多个欧洲语言的各种基准数据集上提高了当前的性能。当目标任务没有训练数据时,弱监督的CE是有效的。

讨论

我们对归一化错误进行了分析,发现复杂实体的归一化仍然具有挑战性。新模块和基准数据集将来可以很容易地集成。

结论

xMEN在多种语言的医学实体归一化方面表现出强大的性能,即使目标语言没有标记数据且术语别名很少。为了在未来实现可重复的基准测试,我们将该系统作为一个开源Python工具包提供。预训练模型和源代码可在线获取:https://github.com/hpi-dhc/xmen。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab58/11671143/6eff3a982f31/ooae147f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab58/11671143/2b8bdb76b4ed/ooae147f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab58/11671143/4f90905fb547/ooae147f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab58/11671143/611d07571e97/ooae147f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab58/11671143/6eff3a982f31/ooae147f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab58/11671143/2b8bdb76b4ed/ooae147f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab58/11671143/4f90905fb547/ooae147f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab58/11671143/611d07571e97/ooae147f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab58/11671143/6eff3a982f31/ooae147f4.jpg

相似文献

1
xMEN: a modular toolkit for cross-lingual medical entity normalization.xMEN:用于跨语言医学实体规范化的模块化工具包。
JAMIA Open. 2024 Dec 26;8(1):ooae147. doi: 10.1093/jamiaopen/ooae147. eCollection 2025 Feb.
2
Medical concept normalization in French using multilingual terminologies and contextual embeddings.使用多语言术语和上下文嵌入进行法语医学概念规范化。
J Biomed Inform. 2021 Feb;114:103684. doi: 10.1016/j.jbi.2021.103684. Epub 2021 Jan 12.
3
Improving biomedical entity linking for complex entity mentions with LLM-based text simplification.基于大语言模型的文本简化技术提升复杂实体提及的生物医学实体链接
Database (Oxford). 2024 Jul 26;2024. doi: 10.1093/database/baae067.
4
TeaBERT: An Efficient Knowledge Infused Cross-Lingual Language Model for Mapping Chinese Medical Entities to the Unified Medical Language System.茶伯特:一种高效的知识注入跨语言语言模型,用于将中文医学实体映射到统一医学语言系统。
IEEE J Biomed Health Inform. 2023 Dec;27(12):6029-6038. doi: 10.1109/JBHI.2023.3315143. Epub 2023 Dec 6.
5
On cross-lingual retrieval with multilingual text encoders.关于使用多语言文本编码器进行跨语言检索。
Inf Retr Boston. 2022;25(2):149-183. doi: 10.1007/s10791-022-09406-x. Epub 2022 Mar 7.
6
Recognition and normalization of multilingual symptom entities using in-domain-adapted BERT models and classification layers.使用领域自适应 BERT 模型和分类层识别和规范化多语言症状实体。
Database (Oxford). 2024 Aug 28;2024. doi: 10.1093/database/baae087.
7
Cross-lingual Unified Medical Language System entity linking in online health communities.在线健康社区中的跨语言统一医学语言系统实体链接。
J Am Med Inform Assoc. 2020 Oct 1;27(10):1585-1592. doi: 10.1093/jamia/ocaa150.
8
Transformer-based approach for symptom recognition and multilingual linking.基于转换器的症状识别和多语言链接方法。
Database (Oxford). 2024 Sep 10;2024. doi: 10.1093/database/baae090.
9
TaggerOne: joint named entity recognition and normalization with semi-Markov Models.TaggerOne:使用半马尔可夫模型进行联合命名实体识别与归一化
Bioinformatics. 2016 Sep 15;32(18):2839-46. doi: 10.1093/bioinformatics/btw343. Epub 2016 Jun 9.
10
Neural machine translation of clinical text: an empirical investigation into multilingual pre-trained language models and transfer-learning.临床文本的神经机器翻译:对多语言预训练语言模型和迁移学习的实证研究。
Front Digit Health. 2024 Feb 26;6:1211564. doi: 10.3389/fdgth.2024.1211564. eCollection 2024.

引用本文的文献

1
High-precision information retrieval for rapid clinical guideline updates.用于快速更新临床指南的高精度信息检索。
NPJ Digit Med. 2025 Apr 27;8(1):227. doi: 10.1038/s41746-025-01648-5.

本文引用的文献

1
A Dataset for Evaluating Contextualized Representation of Biomedical Concepts in Language Models.用于评估语言模型中生物医学概念语境化表示的数据集。
Sci Data. 2024 May 4;11(1):455. doi: 10.1038/s41597-024-03317-w.
2
Exploring the Latest Highlights in Medical Natural Language Processing across Multiple Languages: A Survey.探索多语言医学自然语言处理的最新亮点:综述。
Yearb Med Inform. 2023 Aug;32(1):230-243. doi: 10.1055/s-0043-1768726. Epub 2023 Dec 26.
3
BELB: a biomedical entity linking benchmark.BELB:一个生物医学实体链接基准。
Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad698.
4
GERNERMED++: Semantic annotation in German medical NLP through transfer-learning, translation and word alignment.GERNERMED++:通过迁移学习、翻译和词对齐实现德语医学自然语言处理中的语义标注。
J Biomed Inform. 2023 Nov;147:104513. doi: 10.1016/j.jbi.2023.104513. Epub 2023 Oct 13.
5
An analysis of entity normalization evaluation biases in specialized domains.专门领域实体归一化评估偏差分析。
BMC Bioinformatics. 2023 Jun 2;24(1):227. doi: 10.1186/s12859-023-05350-9.
6
An overview of biomedical entity linking throughout the years.生物医学实体链接概述。
J Biomed Inform. 2023 Jan;137:104252. doi: 10.1016/j.jbi.2022.104252. Epub 2022 Dec 2.
7
Critical assessment of transformer-based AI models for German clinical notes.基于变压器的德国临床记录人工智能模型的批判性评估。
JAMIA Open. 2022 Nov 15;5(4):ooac087. doi: 10.1093/jamiaopen/ooac087. eCollection 2022 Dec.
8
CODER: Knowledge-infused cross-lingual medical term embedding for term normalization.知识注入的跨语言医学术语嵌入用于术语归一化。
J Biomed Inform. 2022 Feb;126:103983. doi: 10.1016/j.jbi.2021.103983. Epub 2022 Jan 4.
9
Annotation and initial evaluation of a large annotated German oncological corpus.一个大型带注释的德语肿瘤学语料库的注释与初步评估。
JAMIA Open. 2021 Apr 19;4(2):ooab025. doi: 10.1093/jamiaopen/ooab025. eCollection 2021 Apr.
10
Medical concept normalization in French using multilingual terminologies and contextual embeddings.使用多语言术语和上下文嵌入进行法语医学概念规范化。
J Biomed Inform. 2021 Feb;114:103684. doi: 10.1016/j.jbi.2021.103684. Epub 2021 Jan 12.