使用语料库和知识库对数据进行层次表示的词向量微调，用于各种机器学习应用。

Fine-Tuning Word Embeddings for Hierarchical Representation of Data Using a Corpus and a Knowledge Base for Various Machine Learning Applications.

机构信息

Department of Computer Science, College of Computer, Qassim University, Buraydah, Saudi Arabia.

Department of Computer Science, University of Liverpool, Liverpool, UK.

出版信息

Comput Math Methods Med. 2021 Nov 16;2021:9761163. doi: 10.1155/2021/9761163. eCollection 2021.

DOI:10.1155/2021/9761163

PMID:34824601

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8610673/

Abstract

Word embedding models have recently shown some capability to encode hierarchical information that exists in textual data. However, such models do not explicitly encode the hierarchical structure that exists among words. In this work, we propose a method to learn hierarchical word embeddings (HWEs) in a specific order to encode the hierarchical information of a knowledge base (KB) in a vector space. To learn the word embeddings, our proposed method considers not only the hypernym relations that exist between words in a KB but also contextual information in a text corpus. The experimental results on various applications, such as supervised and unsupervised hypernymy detection, graded lexical entailment prediction, hierarchical path prediction, and word reconstruction tasks, show the ability of the proposed method to encode the hierarchy. Moreover, the proposed method outperforms previously proposed methods for learning nonspecialised, hypernym-specific, and hierarchical word embeddings on multiple benchmarks.

摘要

词嵌入模型最近已经显示出了一些编码文本数据中存在的层次信息的能力。然而，这样的模型并没有显式地编码单词之间存在的层次结构。在这项工作中，我们提出了一种方法，以便按照特定的顺序学习层次化的词嵌入（HWE），从而在向量空间中对知识库（KB）的层次信息进行编码。为了学习词嵌入，我们的方法不仅考虑了 KB 中单词之间存在的上下位关系，还考虑了文本语料库中的上下文信息。在各种应用程序上的实验结果，如有监督和无监督的上下位词检测、分级词汇蕴涵预测、层次路径预测和单词重构任务，展示了该方法编码层次结构的能力。此外，在多个基准测试中，该方法在学习非专业、特定于上位词的和层次化的词嵌入方面优于之前提出的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1667/8610673/da43f950b989/CMMM2021-9761163.001.jpg

相似文献

Fine-Tuning Word Embeddings for Hierarchical Representation of Data Using a Corpus and a Knowledge Base for Various Machine Learning Applications.使用语料库和知识库对数据进行层次表示的词向量微调，用于各种机器学习应用。

Comput Math Methods Med. 2021 Nov 16;2021:9761163. doi: 10.1155/2021/9761163. eCollection 2021.

Jointly learning word embeddings using a corpus and a knowledge base.联合使用语料库和知识库学习词向量。

PLoS One. 2018 Mar 12;13(3):e0193094. doi: 10.1371/journal.pone.0193094. eCollection 2018.

A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。

J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.

Evaluating semantic relations in neural word embeddings with biomedical and general domain knowledge bases.利用生物医学和一般领域知识库评估神经词汇嵌入中的语义关系。

BMC Med Inform Decis Mak. 2018 Jul 23;18(Suppl 2):65. doi: 10.1186/s12911-018-0630-x.

Learning Contextual Hierarchical Structure of Medical Concepts with Poincairé Embeddings to Clarify Phenotypes.利用庞加莱嵌入学习医学概念的上下文层次结构以阐明表型。

Pac Symp Biocomput. 2019;24:8-17.

CODER: Knowledge-infused cross-lingual medical term embedding for term normalization.知识注入的跨语言医学术语嵌入用于术语归一化。

J Biomed Inform. 2022 Feb;126:103983. doi: 10.1016/j.jbi.2021.103983. Epub 2022 Jan 4.

Improved biomedical word embeddings in the transformer era.Transformer 时代改进的生物医学词向量。

J Biomed Inform. 2021 Aug;120:103867. doi: 10.1016/j.jbi.2021.103867. Epub 2021 Jul 18.

Utility of General and Specific Word Embeddings for Classifying Translational Stages of Research.通用和特定词嵌入在研究转化阶段分类中的效用

AMIA Annu Symp Proc. 2018 Dec 5;2018:1405-1414. eCollection 2018.

Visualization of medical concepts represented using word embeddings: a scoping review.基于词向量表示的医学概念可视化：范围综述。

BMC Med Inform Decis Mak. 2022 Mar 29;22(1):83. doi: 10.1186/s12911-022-01822-9.

Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts.使用词和图嵌入来衡量统一医学语言系统概念之间的语义相关性。

J Am Med Inform Assoc. 2020 Oct 1;27(10):1538-1546. doi: 10.1093/jamia/ocaa136.

引用本文的文献

Retracted: Fine-Tuning Word Embeddings for Hierarchical Representation of Data Using a Corpus and a Knowledge Base for Various Machine Learning Applications.撤回：使用语料库和知识库对词嵌入进行微调以实现数据的分层表示，用于各种机器学习应用。

Comput Math Methods Med. 2023 Dec 6;2023:9867487. doi: 10.1155/2023/9867487. eCollection 2023.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用语料库和知识库对数据进行层次表示的词向量微调，用于各种机器学习应用。

Fine-Tuning Word Embeddings for Hierarchical Representation of Data Using a Corpus and a Knowledge Base for Various Machine Learning Applications.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献