靶向生物医学领域中词和概念嵌入的质量。

Quality of word and concept embeddings in targetted biomedical domains.

作者信息

Giancani Salvatore, Albertoni Riccardo, Catalano Chiara Eva

机构信息

Institut de Neurosciences de la Timone, Unité Mixte de Recherche 7289 Centre National de la Recherce Scientifique and Aix-Marseille Université, Faculty of Medicine, 27, Boulevard Jean Moulin, 13385 Marseille Cedex 05, France.

Istituto di Matematica Applicata e Tecnologie Informatiche, Consiglio Nazionale delle Ricerche, Via De Marini 16, 16149 Genova, Italy.

出版信息

Heliyon. 2023 Jun 2;9(6):e16818. doi: 10.1016/j.heliyon.2023.e16818. eCollection 2023 Jun.

DOI:10.1016/j.heliyon.2023.e16818

PMID:37332929

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10272317/

Abstract

Embeddings are fundamental resources often reused for building intelligent systems in the biomedical context. As a result, evaluating the quality of previously trained embeddings and ensuring they cover the desired information is critical for the success of applications. This paper proposes a new evaluation methodology to test the coverage of embeddings against a targetted domain of interest. It defines measures to assess the terminology, similarity, and analogy coverage, which are core aspects of the embeddings. Then, it discusses the experimentation carried out on existing biomedical embeddings in the specific context of pulmonary diseases. The proposed methodology and measures are general and may be applied to any application domain.

摘要

嵌入是生物医学领域构建智能系统时经常重复使用的基础资源。因此，评估先前训练的嵌入的质量并确保它们涵盖所需信息对于应用的成功至关重要。本文提出了一种新的评估方法，以测试嵌入针对目标感兴趣领域的覆盖范围。它定义了评估术语、相似度和类比覆盖范围的度量，这些都是嵌入的核心方面。然后，它讨论了在肺部疾病的特定背景下对现有生物医学嵌入进行的实验。所提出的方法和度量是通用的，可应用于任何应用领域。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/05ae/10272317/f71d665ee1b9/gr001.jpg

相似文献

Quality of word and concept embeddings in targetted biomedical domains.靶向生物医学领域中词和概念嵌入的质量。

Heliyon. 2023 Jun 2;9(6):e16818. doi: 10.1016/j.heliyon.2023.e16818. eCollection 2023 Jun.

A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。

J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.

Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts.使用词和图嵌入来衡量统一医学语言系统概念之间的语义相关性。

J Am Med Inform Assoc. 2020 Oct 1;27(10):1538-1546. doi: 10.1093/jamia/ocaa136.

Evaluating Biomedical Word Embeddings for Vocabulary Alignment at Scale in the UMLS Metathesaurus Using Siamese Networks.使用连体网络评估生物医学词嵌入以在统一医学语言系统（UMLS）元词表中大规模进行词汇对齐

Proc Conf Assoc Comput Linguist Meet. 2022 May;2022:82-87. doi: 10.18653/v1/2022.insights-1.11.

Comparing general and specialized word embeddings for biomedical named entity recognition.比较用于生物医学命名实体识别的通用词嵌入和专用词嵌入。

PeerJ Comput Sci. 2021 Feb 18;7:e384. doi: 10.7717/peerj-cs.384. eCollection 2021.

Training and intrinsic evaluation of lightweight word embeddings for the clinical domain in Spanish.西班牙语临床领域轻量级词嵌入的训练与内在评估

Front Artif Intell. 2022 Sep 21;5:970517. doi: 10.3389/frai.2022.970517. eCollection 2022.

Multiview Incomplete Knowledge Graph Integration with application to cross-institutional EHR data harmonization.多视图不完整知识图集成及其在跨机构电子健康记录数据协调中的应用。

J Biomed Inform. 2022 Sep;133:104147. doi: 10.1016/j.jbi.2022.104147. Epub 2022 Jul 21.

DeIDNER Model: A Neural Network Named Entity Recognition Model for Use in the De-identification of Clinical Notes.DeIDNER模型：一种用于临床记录去识别化的神经网络命名实体识别模型。

Biomed Eng Syst Technol Int Jt Conf BIOSTEC Revis Sel Pap. 2022 Feb;5:640-647. doi: 10.5220/0010884500003123.

Multi-Ontology Refined Embeddings (MORE): A hybrid multi-ontology and corpus-based semantic representation model for biomedical concepts.多本体精炼嵌入模型（MORE）：一种基于混合多本体和语料库的生物医学概念语义表示模型。

J Biomed Inform. 2020 Nov;111:103581. doi: 10.1016/j.jbi.2020.103581. Epub 2020 Oct 1.

Supervised Learning and Knowledge-Based Approaches Applied to Biomedical Word Sense Disambiguation.应用于生物医学词义消歧的监督学习和基于知识的方法。

J Integr Bioinform. 2017 Dec 13;14(4):/j/jib.2017.14.issue-4/jib-2017-0051/jib-2017-0051.xml. doi: 10.1515/jib-2017-0051.

本文引用的文献

Construction of a Linked Data Set of COVID-19 Knowledge Graphs: Development and Applications.构建COVID-19知识图谱链接数据集：开发与应用

JMIR Med Inform. 2022 May 13;10(5):e37215. doi: 10.2196/37215.

Improved biomedical word embeddings in the transformer era.Transformer 时代改进的生物医学词向量。

J Biomed Inform. 2021 Aug;120:103867. doi: 10.1016/j.jbi.2021.103867. Epub 2021 Jul 18.

Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data.从海量多模态医学数据中学习的临床概念嵌入。

Pac Symp Biocomput. 2020;25:295-306.

SECNLP: A survey of embeddings in clinical natural language processing.SECNLP：临床自然语言处理中的嵌入技术综述。

J Biomed Inform. 2020 Jan;101:103323. doi: 10.1016/j.jbi.2019.103323. Epub 2019 Nov 8.

Wikidata: A large-scale collaborative ontological medical database.Wikidata：一个大规模的协作本体医学数据库。

J Biomed Inform. 2019 Nov;99:103292. doi: 10.1016/j.jbi.2019.103292. Epub 2019 Sep 23.

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT：一种用于生物医学文本挖掘的预训练生物医学语言表示模型。

Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.

A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。

J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.

Evaluating semantic relations in neural word embeddings with biomedical and general domain knowledge bases.利用生物医学和一般领域知识库评估神经词汇嵌入中的语义关系。

BMC Med Inform Decis Mak. 2018 Jul 23;18(Suppl 2):65. doi: 10.1186/s12911-018-0630-x.

BIOSSES: a semantic sentence similarity estimation system for the biomedical domain.BIOSSES：一种用于生物医学领域的语义句子相似度估计系统。

Bioinformatics. 2017 Jul 15;33(14):i49-i58. doi: 10.1093/bioinformatics/btx238.

Learning Low-Dimensional Representations of Medical Concepts.学习医学概念的低维表示。

AMIA Jt Summits Transl Sci Proc. 2016 Jul 20;2016:41-50. eCollection 2016.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

靶向生物医学领域中词和概念嵌入的质量。

Quality of word and concept embeddings in targetted biomedical domains.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献