• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

MedLexSp- 西班牙语医学自然语言处理的医学词典。

MedLexSp - a medical lexicon for Spanish medical natural language processing.

机构信息

Instituto de Lengua, Literatura y Antropología (ILLA), CSIC (Spanish National Research Council), Albasanz 26-28, 28037, Madrid, Spain.

出版信息

J Biomed Semantics. 2023 Feb 2;14(1):2. doi: 10.1186/s13326-022-00281-5.

DOI:10.1186/s13326-022-00281-5
PMID:36732862
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9892682/
Abstract

BACKGROUND

Medical lexicons enable the natural language processing (NLP) of health texts. Lexicons gather terms and concepts from thesauri and ontologies, and linguistic data for part-of-speech (PoS) tagging, lemmatization or natural language generation. To date, there is no such type of resource for Spanish.

CONSTRUCTION AND CONTENT

This article describes an unified medical lexicon for Medical Natural Language Processing in Spanish. MedLexSp includes terms and inflected word forms with PoS information and Unified Medical Language System[Formula: see text] (UMLS) semantic types, groups and Concept Unique Identifiers (CUIs). To create it, we used NLP techniques and domain corpora (e.g. MedlinePlus). We also collected terms from the Dictionary of Medical Terms from the Spanish Royal Academy of Medicine, the Medical Subject Headings (MeSH), the Systematized Nomenclature of Medicine - Clinical Terms (SNOMED-CT), the Medical Dictionary for Regulatory Activities Terminology (MedDRA), the International Classification of Diseases vs. 10, the Anatomical Therapeutic Chemical Classification, the National Cancer Institute (NCI) Dictionary, the Online Mendelian Inheritance in Man (OMIM) and OrphaData. Terms related to COVID-19 were assembled by applying a similarity-based approach with word embeddings trained on a large corpus. MedLexSp includes 100 887 lemmas, 302 543 inflected forms (conjugated verbs, and number/gender variants), and 42 958 UMLS CUIs. We report two use cases of MedLexSp. First, applying the lexicon to pre-annotate a corpus of 1200 texts related to clinical trials. Second, PoS tagging and lemmatizing texts about clinical cases. MedLexSp improved the scores for PoS tagging and lemmatization compared to the default Spacy and Stanza python libraries.

CONCLUSIONS

The lexicon is distributed in a delimiter-separated value file; an XML file with the Lexical Markup Framework; a lemmatizer module for the Spacy and Stanza libraries; and complementary Lexical Record (LR) files. The embeddings and code to extract COVID-19 terms, and the Spacy and Stanza lemmatizers enriched with medical terms are provided in a public repository.

摘要

背景

医学词典使健康文本的自然语言处理(NLP)成为可能。词典从词库和本体中收集术语和概念,并从词汇数据中进行词性(PoS)标注、词干提取或自然语言生成。迄今为止,西班牙语还没有这样的资源。

构建与内容

本文介绍了一种用于西班牙语医学自然语言处理的统一医学词典。MedLexSp 包含具有 PoS 信息和统一医学语言系统[公式:见文本](UMLS)语义类型、组和概念唯一标识符(CUI)的术语和屈折词形式。为了创建它,我们使用了 NLP 技术和领域语料库(例如 MedlinePlus)。我们还从西班牙皇家医学科学院的医学术语词典、医学主题词(MeSH)、医学系统命名法-临床术语(SNOMED-CT)、药物监管活动术语学词典(MedDRA)、国际疾病分类第 10 版、解剖治疗化学分类、国家癌症研究所(NCI)词典、在线孟德尔遗传人类(OMIM)和 OrphaData 中收集了术语。与 COVID-19 相关的术语是通过应用基于相似性的方法,使用在大型语料库上训练的词嵌入来组装的。MedLexSp 包含 100,887 个词干、302,543 个屈折形式(共轭动词和数字/性别变体)和 42,958 个 UMLS CUI。我们报告了 MedLexSp 的两个用例。首先,将词典应用于预注释包含 1200 篇临床试验相关文本的语料库。其次,对临床病例相关的文本进行词性标注和词干提取。与默认的 Spacy 和 Stanza Python 库相比,MedLexSp 提高了词性标注和词干提取的分数。

结论

该词典以分隔值文件的形式分发;具有词汇标记框架的 XML 文件;Spacy 和 Stanza 库的词干提取模块;以及补充词汇记录(LR)文件。公共存储库中提供了提取 COVID-19 术语的嵌入和代码,以及用医学术语丰富的 Spacy 和 Stanza 词干提取器。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e748/9893549/fd5c85c36bef/13326_2022_281_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e748/9893549/5e6a34deff36/13326_2022_281_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e748/9893549/2a99a622fe67/13326_2022_281_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e748/9893549/b4e007c88232/13326_2022_281_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e748/9893549/8e8b38109ae3/13326_2022_281_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e748/9893549/8d72826d6cc4/13326_2022_281_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e748/9893549/3e76a115345b/13326_2022_281_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e748/9893549/e252e641f5ca/13326_2022_281_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e748/9893549/d419b7ac0f37/13326_2022_281_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e748/9893549/613b4aa81f0e/13326_2022_281_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e748/9893549/ac6693e5979a/13326_2022_281_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e748/9893549/fd5c85c36bef/13326_2022_281_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e748/9893549/5e6a34deff36/13326_2022_281_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e748/9893549/2a99a622fe67/13326_2022_281_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e748/9893549/b4e007c88232/13326_2022_281_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e748/9893549/8e8b38109ae3/13326_2022_281_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e748/9893549/8d72826d6cc4/13326_2022_281_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e748/9893549/3e76a115345b/13326_2022_281_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e748/9893549/e252e641f5ca/13326_2022_281_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e748/9893549/d419b7ac0f37/13326_2022_281_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e748/9893549/613b4aa81f0e/13326_2022_281_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e748/9893549/ac6693e5979a/13326_2022_281_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e748/9893549/fd5c85c36bef/13326_2022_281_Fig11_HTML.jpg

相似文献

1
MedLexSp - a medical lexicon for Spanish medical natural language processing.MedLexSp- 西班牙语医学自然语言处理的医学词典。
J Biomed Semantics. 2023 Feb 2;14(1):2. doi: 10.1186/s13326-022-00281-5.
2
A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。
J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.
3
Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts.使用词和图嵌入来衡量统一医学语言系统概念之间的语义相关性。
J Am Med Inform Assoc. 2020 Oct 1;27(10):1538-1546. doi: 10.1093/jamia/ocaa136.
4
A semantic lexicon for medical language processing.用于医学语言处理的语义词典。
J Am Med Inform Assoc. 1999 May-Jun;6(3):205-18. doi: 10.1136/jamia.1999.0060205.
5
A technique for semantic classification of unknown words using UMLS resources.一种使用统一医学语言系统(UMLS)资源对未知单词进行语义分类的技术。
Proc AMIA Symp. 1999:716-20.
6
Collecting specialty-related medical terms: Development and evaluation of a resource for Spanish.收集专业相关医学术语:西班牙语资源的开发与评估。
BMC Med Inform Decis Mak. 2021 May 4;21(1):145. doi: 10.1186/s12911-021-01495-w.
7
Unified Medical Language System term occurrences in clinical notes: a large-scale corpus analysis.临床记录中统一医学语言系统术语的出现:大规模语料库分析。
J Am Med Inform Assoc. 2012 Jun;19(e1):e149-56. doi: 10.1136/amiajnl-2011-000744. Epub 2012 Apr 4.
8
Evaluating the UMLS as a source of lexical knowledge for medical language processing.评估作为医学语言处理词汇知识来源的统一医学语言系统(UMLS)。
Proc AMIA Symp. 2001:189-93.
9
Towards a semantic lexicon for clinical natural language processing.迈向用于临床自然语言处理的语义词典。
AMIA Annu Symp Proc. 2012;2012:568-76. Epub 2012 Nov 3.
10
Automatic lexeme acquisition for a multilingual medical subword thesaurus.用于多语言医学子词词典的自动词元获取。
Int J Med Inform. 2007 Feb-Mar;76(2-3):184-9. doi: 10.1016/j.ijmedinf.2006.05.032. Epub 2006 Jul 12.

引用本文的文献

1
Year 2023 in Biomedical Natural Language Processing: a Tribute to Large Language Models and Generative AI.2023年生物医学自然语言处理领域:向大语言模型和生成式人工智能致敬。
Yearb Med Inform. 2024 Aug;33(1):241-248. doi: 10.1055/s-0044-1800751. Epub 2025 Apr 8.
2
Hybrid natural language processing tool for semantic annotation of medical texts in Spanish.用于西班牙语医学文本语义标注的混合自然语言处理工具。
BMC Bioinformatics. 2025 Jan 8;26(1):7. doi: 10.1186/s12859-024-05949-6.
3
Topic prediction for tobacco control based on COP9 tweets using machine learning techniques.

本文引用的文献

1
BioVerbNet: a large semantic-syntactic classification of verbs in biomedicine.BioVerbNet:生物医学中动词的大型语义句法分类。
J Biomed Semantics. 2021 Jul 15;12(1):12. doi: 10.1186/s13326-021-00247-z.
2
Hybrid Deep Learning for Medication-Related Information Extraction From Clinical Texts in French: MedExt Algorithm Development Study.用于从法语临床文本中提取药物相关信息的混合深度学习:MedExt算法开发研究
JMIR Med Inform. 2021 Mar 16;9(3):e17934. doi: 10.2196/17934.
3
A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine.
基于机器学习技术的 COP9 推特点的烟草控制主题预测。
PLoS One. 2024 Feb 15;19(2):e0298298. doi: 10.1371/journal.pone.0298298. eCollection 2024.
一个用统一医学语言系统(UMLS)实体注释的临床试验语料库,以加强对循证医学的获取。
BMC Med Inform Decis Mak. 2021 Feb 22;21(1):69. doi: 10.1186/s12911-021-01395-z.
4
UMLS-based data augmentation for natural language processing of clinical research literature.基于 UMLS 的临床研究文献自然语言处理的数据增强。
J Am Med Inform Assoc. 2021 Mar 18;28(4):812-823. doi: 10.1093/jamia/ocaa309.
5
NCBI Taxonomy: a comprehensive update on curation, resources and tools.NCBI 分类学:在管理、资源和工具方面的全面更新。
Database (Oxford). 2020 Jan 1;2020. doi: 10.1093/database/baaa062.
6
Terminologies augmented recurrent neural network model for clinical named entity recognition.基于扩充术语的循环神经网络模型在临床命名实体识别中的应用。
J Biomed Inform. 2020 Feb;102:103356. doi: 10.1016/j.jbi.2019.103356. Epub 2019 Dec 16.
7
Using word embeddings to expand terminology of dietary supplements on clinical notes.利用词嵌入技术扩展临床记录中膳食补充剂的术语。
JAMIA Open. 2019 Jul;2(2):246-253. doi: 10.1093/jamiaopen/ooz007. Epub 2019 Mar 28.
8
Deep neural networks ensemble for detecting medication mentions in tweets.深度学习网络集成模型用于在推文文本中检测药物提及。
J Am Med Inform Assoc. 2019 Dec 1;26(12):1618-1626. doi: 10.1093/jamia/ocz156.
9
Do You Need Embeddings Trained on a Massive Specialized Corpus for Your Clinical Natural Language Processing Task?对于您的临床自然语言处理任务,您是否需要在大规模专业语料库上训练的嵌入?
Stud Health Technol Inform. 2019 Aug 21;264:1558-1559. doi: 10.3233/SHTI190533.
10
Named entity recognition from Chinese adverse drug event reports with lexical feature based BiLSTM-CRF and tri-training.基于词汇特征的 BiLSTM-CRF 和三训练的中药不良事件报告命名实体识别。
J Biomed Inform. 2019 Aug;96:103252. doi: 10.1016/j.jbi.2019.103252. Epub 2019 Jul 16.