• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过融合局部和全局上下文进行无监督词嵌入学习

Unsupervised Word Embedding Learning by Incorporating Local and Global Contexts.

作者信息

Meng Yu, Huang Jiaxin, Wang Guangyuan, Wang Zihan, Zhang Chao, Han Jiawei

机构信息

Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL, United States.

School of Computational Science and Engineering, College of Computing, Georgia Institute of Technology, Atlanta, GA, United States.

出版信息

Front Big Data. 2020 Mar 11;3:9. doi: 10.3389/fdata.2020.00009. eCollection 2020.

DOI:10.3389/fdata.2020.00009
PMID:33693384
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7931948/
Abstract

Word embedding has benefited a broad spectrum of text analysis tasks by learning distributed word representations to encode word semantics. Word representations are typically learned by modeling local contexts of words, assuming that words sharing similar surrounding words are semantically close. We argue that local contexts can only partially define word semantics in the unsupervised word embedding learning. Global contexts, referring to the broader semantic units, such as the document or paragraph where the word appears, can capture different aspects of word semantics and complement local contexts. We propose two simple yet effective unsupervised word embedding models that jointly model both local and global contexts to learn word representations. We provide theoretical interpretations of the proposed models to demonstrate how local and global contexts are jointly modeled, assuming a generative relationship between words and contexts. We conduct a thorough evaluation on a wide range of benchmark datasets. Our quantitative analysis and case study show that despite their simplicity, our two proposed models achieve superior performance on word similarity and text classification tasks.

摘要

词嵌入通过学习分布式词表示来编码词的语义,从而使广泛的文本分析任务受益。词表示通常通过对词的局部上下文进行建模来学习,假设共享相似周围词的词在语义上相近。我们认为,在无监督词嵌入学习中,局部上下文只能部分地定义词的语义。全局上下文,指的是更广泛的语义单元,例如词出现的文档或段落,可以捕捉词语义的不同方面并补充局部上下文。我们提出了两种简单而有效的无监督词嵌入模型,它们联合对局部和全局上下文进行建模以学习词表示。我们对所提出的模型进行了理论解释,以展示如何在假设词与上下文之间存在生成关系的情况下联合对局部和全局上下文进行建模。我们在广泛的基准数据集上进行了全面评估。我们的定量分析和案例研究表明,尽管我们提出的两个模型很简单,但它们在词相似度和文本分类任务上取得了卓越的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b718/7931948/879e1a5e5a9f/fdata-03-00009-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b718/7931948/374da76caef9/fdata-03-00009-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b718/7931948/999bad1383fd/fdata-03-00009-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b718/7931948/aa270b15e078/fdata-03-00009-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b718/7931948/879e1a5e5a9f/fdata-03-00009-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b718/7931948/374da76caef9/fdata-03-00009-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b718/7931948/999bad1383fd/fdata-03-00009-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b718/7931948/aa270b15e078/fdata-03-00009-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b718/7931948/879e1a5e5a9f/fdata-03-00009-g0004.jpg

相似文献

1
Unsupervised Word Embedding Learning by Incorporating Local and Global Contexts.通过融合局部和全局上下文进行无监督词嵌入学习
Front Big Data. 2020 Mar 11;3:9. doi: 10.3389/fdata.2020.00009. eCollection 2020.
2
Biomedical Text Classification Using Augmented Word Representation Based on Distributional and Relational Contexts.基于分布和关系上下文的增强词表示法进行生物医学文本分类
Comput Intell Neurosci. 2023 Feb 15;2023:2989791. doi: 10.1155/2023/2989791. eCollection 2023.
3
Impact of word embedding models on text analytics in deep learning environment: a review.词嵌入模型对深度学习环境下文本分析的影响:综述
Artif Intell Rev. 2023 Feb 22:1-81. doi: 10.1007/s10462-023-10419-1.
4
A supervised topic embedding model and its application.监督主题嵌入模型及其应用。
PLoS One. 2022 Nov 4;17(11):e0277104. doi: 10.1371/journal.pone.0277104. eCollection 2022.
5
Jointly learning word embeddings using a corpus and a knowledge base.联合使用语料库和知识库学习词向量。
PLoS One. 2018 Mar 12;13(3):e0193094. doi: 10.1371/journal.pone.0193094. eCollection 2018.
6
Incorporating linguistic knowledge for learning distributed word representations.整合语言知识以学习分布式词表示。
PLoS One. 2015 Apr 13;10(4):e0118437. doi: 10.1371/journal.pone.0118437. eCollection 2015.
7
Neural sentence embedding models for semantic similarity estimation in the biomedical domain.生物医学领域中语义相似度估计的神经句子嵌入模型。
BMC Bioinformatics. 2019 Apr 11;20(1):178. doi: 10.1186/s12859-019-2789-2.
8
Opposing effects of semantic diversity in lexical and semantic relatedness decisions.词汇和语义相关性判断中语义多样性的相反作用。
J Exp Psychol Hum Percept Perform. 2015 Apr;41(2):385-402. doi: 10.1037/a0038995. Epub 2015 Mar 9.
9
Short text topic modelling using local and global word-context semantic correlation.使用局部和全局词上下文语义相关性的短文本主题建模
Multimed Tools Appl. 2023 Feb 2:1-23. doi: 10.1007/s11042-023-14352-x.
10
Fine-Tuning Word Embeddings for Hierarchical Representation of Data Using a Corpus and a Knowledge Base for Various Machine Learning Applications.使用语料库和知识库对数据进行层次表示的词向量微调,用于各种机器学习应用。
Comput Math Methods Med. 2021 Nov 16;2021:9761163. doi: 10.1155/2021/9761163. eCollection 2021.

引用本文的文献

1
Enhancing chemical synthesis research with NLP: Word embeddings for chemical reagent identification-A case study on nano-FeCu.利用自然语言处理技术加强化学合成研究:用于化学试剂识别的词嵌入——以纳米铁铜为例
iScience. 2024 Aug 29;27(10):110780. doi: 10.1016/j.isci.2024.110780. eCollection 2024 Oct 18.

本文引用的文献

1
Rationale-Augmented Convolutional Neural Networks for Text Classification.用于文本分类的基于原理增强的卷积神经网络。
Proc Conf Empir Methods Nat Lang Process. 2016 Nov;2016:795-804. doi: 10.18653/v1/d16-1076.