• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种基于词义归纳构建词义嵌入的方法。

A method for constructing word sense embeddings based on word sense induction.

作者信息

Sun Yujia, Platoš Jan

机构信息

Department of Computer Science, Technical University of Ostrava, 17. Listopadu 2172/15, 70800, Ostrava-Poruba, Czech Republic.

Institute of Network Information Security, Hebei GEO University, No. 136 East Huai΄an Road , Shijiazhuang, 050031, Hebei, China.

出版信息

Sci Rep. 2023 Aug 9;13(1):12945. doi: 10.1038/s41598-023-40062-3.

DOI:10.1038/s41598-023-40062-3
PMID:37558764
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10412592/
Abstract

Polysemy is an inherent characteristic of natural language. In order to make it easier to distinguish between different senses of polysemous words, we propose a method for encoding multiple different senses of polysemous words using a single vector. The method first uses a two-layer bidirectional long short-term memory neural network and a self-attention mechanism to extract the contextual information of polysemous words. Then, a K-means algorithm, which is improved by optimizing the density peaks clustering algorithm based on cosine similarity, is applied to perform word sense induction on the contextual information of polysemous words. Finally, the method constructs the corresponding word sense embedded representations of the polysemous words. The results of the experiments demonstrate that the proposed method produces better word sense induction than Euclidean distance, Pearson correlation, and KL-divergence and more accurate word sense embeddings than mean shift, DBSCAN, spectral clustering, and agglomerative clustering.

摘要

一词多义是自然语言的固有特征。为了更便于区分多义词的不同语义,我们提出了一种使用单个向量对多义词的多种不同语义进行编码的方法。该方法首先使用两层双向长短期记忆神经网络和自注意力机制来提取多义词的上下文信息。然后,应用一种基于余弦相似度优化密度峰值聚类算法改进的K均值算法,对多义词的上下文信息进行词义归纳。最后,该方法构建多义词相应的词义嵌入表示。实验结果表明,所提出的方法比欧几里得距离、皮尔逊相关性和KL散度产生更好的词义归纳,并且比均值漂移、DBSCAN、谱聚类和凝聚聚类产生更准确的词义嵌入。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3504/10412592/b7c9b3d7bf91/41598_2023_40062_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3504/10412592/9be0a511054f/41598_2023_40062_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3504/10412592/245e6608d9ba/41598_2023_40062_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3504/10412592/6ec0803e194a/41598_2023_40062_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3504/10412592/bc8f4d46fe6f/41598_2023_40062_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3504/10412592/b7c9b3d7bf91/41598_2023_40062_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3504/10412592/9be0a511054f/41598_2023_40062_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3504/10412592/245e6608d9ba/41598_2023_40062_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3504/10412592/6ec0803e194a/41598_2023_40062_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3504/10412592/bc8f4d46fe6f/41598_2023_40062_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3504/10412592/b7c9b3d7bf91/41598_2023_40062_Fig5_HTML.jpg

相似文献

1
A method for constructing word sense embeddings based on word sense induction.一种基于词义归纳构建词义嵌入的方法。
Sci Rep. 2023 Aug 9;13(1):12945. doi: 10.1038/s41598-023-40062-3.
2
Word Senses as Clusters of Meaning Modulations: A Computational Model of Polysemy.词义作为意义调制的聚类:一词多义的计算模型。
Cogn Sci. 2021 Apr;45(4):e12955. doi: 10.1111/cogs.12955.
3
Polysemy in Sentence Comprehension: Effects of Meaning Dominance.句子理解中的一词多义现象:意义优势的影响。
J Mem Lang. 2012 Nov 1;67(4):407-425. doi: 10.1016/j.jml.2012.07.010. Epub 2012 Sep 4.
4
A comprehensive dataset for Arabic word sense disambiguation.
Data Brief. 2024 Jun 4;55:110591. doi: 10.1016/j.dib.2024.110591. eCollection 2024 Aug.
5
Biomedical word sense disambiguation with bidirectional long short-term memory and attention-based neural networks.基于双向长短期记忆和注意力机制的神经网络的生物医学词义消歧。
BMC Bioinformatics. 2019 Dec 2;20(Suppl 16):502. doi: 10.1186/s12859-019-3079-8.
6
The Mental Representation of Polysemy across Word Classes.多义词在不同词性中的心理表征
Front Psychol. 2018 Feb 21;9:192. doi: 10.3389/fpsyg.2018.00192. eCollection 2018.
7
Unsupervised multi-sense language models for natural language processing tasks.用于自然语言处理任务的无监督多义语言模型。
Neural Netw. 2021 Oct;142:397-409. doi: 10.1016/j.neunet.2021.05.023. Epub 2021 May 25.
8
The representation of polysemy: MEG evidence.多义词的表征:脑磁图证据。
J Cogn Neurosci. 2006 Jan;18(1):97-109. doi: 10.1162/089892906775250003.
9
Word embeddings and recurrent neural networks based on Long-Short Term Memory nodes in supervised biomedical word sense disambiguation.基于长短期记忆节点的词嵌入和循环神经网络在有监督生物医学词义消歧中的应用
J Biomed Inform. 2017 Sep;73:137-147. doi: 10.1016/j.jbi.2017.08.001. Epub 2017 Aug 7.
10
Feature-rich multiplex lexical networks reveal mental strategies of early language learning.丰富的多元词汇网络揭示了早期语言学习的心理策略。
Sci Rep. 2023 Jan 26;13(1):1474. doi: 10.1038/s41598-022-27029-6.

本文引用的文献

1
Unsupervised multi-sense language models for natural language processing tasks.用于自然语言处理任务的无监督多义语言模型。
Neural Netw. 2021 Oct;142:397-409. doi: 10.1016/j.neunet.2021.05.023. Epub 2021 May 25.
2
Machine learning. Clustering by fast search and find of density peaks.机器学习。基于密度峰值的快速搜索和发现的聚类。
Science. 2014 Jun 27;344(6191):1492-6. doi: 10.1126/science.1242072.
3
Long short-term memory.长短期记忆
Neural Comput. 1997 Nov 15;9(8):1735-80. doi: 10.1162/neco.1997.9.8.1735.