• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于自然语言处理任务的无监督多义语言模型。

Unsupervised multi-sense language models for natural language processing tasks.

作者信息

Roh Jihyeon, Park Sungjin, Kim Bo-Kyeong, Oh Sang-Hoon, Lee Soo-Young

机构信息

School of Electrical Engineering and Institute for Artificial Intelligence, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea.

Information & Electronics Research Institute, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea.

出版信息

Neural Netw. 2021 Oct;142:397-409. doi: 10.1016/j.neunet.2021.05.023. Epub 2021 May 25.

DOI:10.1016/j.neunet.2021.05.023
PMID:34139656
Abstract

Existing language models (LMs) represent each word with only a single representation, which is unsuitable for processing words with multiple meanings. This issue has often been compounded by the lack of availability of large-scale data annotated with word meanings. In this paper, we propose a sense-aware framework that can process multi-sense word information without relying on annotated data. In contrast to the existing multi-sense representation models, which handle information in a restricted context, our framework provides context representations encoded without ignoring word order information or long-term dependency. The proposed framework consists of a context representation stage to encode the variable-size context, a sense-labeling stage that involves unsupervised clustering to infer a probable sense for a word in each context, and a multi-sense LM (MSLM) learning stage to learn the multi-sense representations. Particularly for the evaluation of MSLMs with different vocabulary sizes, we propose a new metric, i.e., unigram-normalized perplexity (PPLu), which is also understood as the negated mutual information between a word and its context information. Additionally, there is a theoretical verification of PPLu on the change of vocabulary size. Also, we adopt a method of estimating the number of senses, which does not require further hyperparameter search for an LM performance. For the LMs in our framework, both unidirectional and bidirectional architectures based on long short-term memory (LSTM) and Transformers are adopted. We conduct comprehensive experiments on three language modeling datasets to perform quantitative and qualitative comparisons of various LMs. Our MSLM outperforms single-sense LMs (SSLMs) with the same network architecture and parameters. It also shows better performance on several downstream natural language processing tasks in the General Language Understanding Evaluation (GLUE) and SuperGLUE benchmarks.

摘要

现有的语言模型(LMs)只用单一表示来呈现每个单词,这不适用于处理具有多种含义的单词。由于缺乏带有词义注释的大规模数据,这个问题常常变得更加复杂。在本文中,我们提出了一个词义感知框架,它可以在不依赖注释数据的情况下处理多义词信息。与现有的在受限上下文中处理信息的多义表示模型不同,我们的框架提供了在不忽略词序信息或长期依赖的情况下编码的上下文表示。所提出的框架由一个用于对可变大小上下文进行编码的上下文表示阶段、一个涉及无监督聚类以推断每个上下文中单词可能含义的词义标注阶段以及一个用于学习多义表示的多义语言模型(MSLM)学习阶段组成。特别针对具有不同词汇量的MSLMs的评估,我们提出了一种新的度量标准,即单字归一化困惑度(PPLu),它也被理解为一个单词与其上下文信息之间的负互信息。此外,对PPLu随词汇量变化进行了理论验证。而且,我们采用了一种估计词义数量的方法,该方法不需要为语言模型性能进行进一步的超参数搜索。对于我们框架中的语言模型,采用了基于长短期记忆(LSTM)和Transformer的单向和双向架构。我们在三个语言建模数据集上进行了全面实验,以对各种语言模型进行定量和定性比较。我们的MSLM在相同网络架构和参数下优于单义语言模型(SSLMs)。它在通用语言理解评估(GLUE)和超级GLUE基准测试中的几个下游自然语言处理任务上也表现出更好的性能。

相似文献

1
Unsupervised multi-sense language models for natural language processing tasks.用于自然语言处理任务的无监督多义语言模型。
Neural Netw. 2021 Oct;142:397-409. doi: 10.1016/j.neunet.2021.05.023. Epub 2021 May 25.
2
An experimental study of graph connectivity for unsupervised word sense disambiguation.无监督词义消歧的图连接性实验研究。
IEEE Trans Pattern Anal Mach Intell. 2010 Apr;32(4):678-92. doi: 10.1109/TPAMI.2009.36.
3
A novel framework for biomedical entity sense induction.一种用于生物医学实体感知归纳的新框架。
J Biomed Inform. 2018 Aug;84:31-41. doi: 10.1016/j.jbi.2018.06.007. Epub 2018 Jun 20.
4
Biomedical word sense disambiguation with bidirectional long short-term memory and attention-based neural networks.基于双向长短期记忆和注意力机制的神经网络的生物医学词义消歧。
BMC Bioinformatics. 2019 Dec 2;20(Suppl 16):502. doi: 10.1186/s12859-019-3079-8.
5
deepBioWSD: effective deep neural word sense disambiguation of biomedical text data.深度生物词汇语义消歧:生物医学文本数据的有效深度神经网络词汇语义消歧。
J Am Med Inform Assoc. 2019 May 1;26(5):438-446. doi: 10.1093/jamia/ocy189.
6
Harmony Search Algorithm for Word Sense Disambiguation.用于词义消歧的和声搜索算法。
PLoS One. 2015 Sep 30;10(9):e0136614. doi: 10.1371/journal.pone.0136614. eCollection 2015.
7
A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。
J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.
8
Dependency-based Siamese long short-term memory network for learning sentence representations.基于依赖的孪生长短时记忆网络用于学习句子表示。
PLoS One. 2018 Mar 7;13(3):e0193919. doi: 10.1371/journal.pone.0193919. eCollection 2018.
9
Syntactically-informed word representations from graph neural network.基于图神经网络的句法信息词表示。
Neurocomputing (Amst). 2020 Nov 6;413:431-443. doi: 10.1016/j.neucom.2020.06.070.
10
Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features.使用带有词表示特征的结构支持向量机识别医院出院小结中的临床实体。
BMC Med Inform Decis Mak. 2013;13 Suppl 1(Suppl 1):S1. doi: 10.1186/1472-6947-13-S1-S1. Epub 2013 Apr 5.

引用本文的文献

1
ChatGPT in surgery: a revolutionary innovation?ChatGPT 在外科手术中的应用:一场革命性的创新?
Surg Today. 2024 Aug;54(8):964-971. doi: 10.1007/s00595-024-02800-6. Epub 2024 Feb 29.
2
A method for constructing word sense embeddings based on word sense induction.一种基于词义归纳构建词义嵌入的方法。
Sci Rep. 2023 Aug 9;13(1):12945. doi: 10.1038/s41598-023-40062-3.
3
A Novel Adaptive Affective Cognition Analysis Model for College Students Using a Deep Convolution Neural Network and Deep Features.基于深度卷积神经网络和深度特征的大学生新型自适应情感认知分析模型。
Comput Intell Neurosci. 2022 Aug 27;2022:2114114. doi: 10.1155/2022/2114114. eCollection 2022.
4
Multi-Source Selection Transfer Learning with Privacy-Preserving.具有隐私保护的多源选择迁移学习
Neural Process Lett. 2022;54(6):4921-4950. doi: 10.1007/s11063-022-10841-6. Epub 2022 May 7.