Suppr超能文献

通过融合局部和全局上下文进行无监督词嵌入学习

Unsupervised Word Embedding Learning by Incorporating Local and Global Contexts.

作者信息

Meng Yu, Huang Jiaxin, Wang Guangyuan, Wang Zihan, Zhang Chao, Han Jiawei

机构信息

Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL, United States.

School of Computational Science and Engineering, College of Computing, Georgia Institute of Technology, Atlanta, GA, United States.

出版信息

Front Big Data. 2020 Mar 11;3:9. doi: 10.3389/fdata.2020.00009. eCollection 2020.

Abstract

Word embedding has benefited a broad spectrum of text analysis tasks by learning distributed word representations to encode word semantics. Word representations are typically learned by modeling local contexts of words, assuming that words sharing similar surrounding words are semantically close. We argue that local contexts can only partially define word semantics in the unsupervised word embedding learning. Global contexts, referring to the broader semantic units, such as the document or paragraph where the word appears, can capture different aspects of word semantics and complement local contexts. We propose two simple yet effective unsupervised word embedding models that jointly model both local and global contexts to learn word representations. We provide theoretical interpretations of the proposed models to demonstrate how local and global contexts are jointly modeled, assuming a generative relationship between words and contexts. We conduct a thorough evaluation on a wide range of benchmark datasets. Our quantitative analysis and case study show that despite their simplicity, our two proposed models achieve superior performance on word similarity and text classification tasks.

摘要

词嵌入通过学习分布式词表示来编码词的语义,从而使广泛的文本分析任务受益。词表示通常通过对词的局部上下文进行建模来学习,假设共享相似周围词的词在语义上相近。我们认为,在无监督词嵌入学习中,局部上下文只能部分地定义词的语义。全局上下文,指的是更广泛的语义单元,例如词出现的文档或段落,可以捕捉词语义的不同方面并补充局部上下文。我们提出了两种简单而有效的无监督词嵌入模型,它们联合对局部和全局上下文进行建模以学习词表示。我们对所提出的模型进行了理论解释,以展示如何在假设词与上下文之间存在生成关系的情况下联合对局部和全局上下文进行建模。我们在广泛的基准数据集上进行了全面评估。我们的定量分析和案例研究表明,尽管我们提出的两个模型很简单,但它们在词相似度和文本分类任务上取得了卓越的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b718/7931948/374da76caef9/fdata-03-00009-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验