Suppr超能文献

神经词汇嵌入中的语义关系的可视化探索。

Visual Exploration of Semantic Relationships in Neural Word Embeddings.

出版信息

IEEE Trans Vis Comput Graph. 2018 Jan;24(1):553-562. doi: 10.1109/TVCG.2017.2745141. Epub 2017 Aug 29.

Abstract

Constructing distributed representations for words through neural language models and using the resulting vector spaces for analysis has become a crucial component of natural language processing (NLP). However, despite their widespread application, little is known about the structure and properties of these spaces. To gain insights into the relationship between words, the NLP community has begun to adapt high-dimensional visualization techniques. In particular, researchers commonly use t-distributed stochastic neighbor embeddings (t-SNE) and principal component analysis (PCA) to create two-dimensional embeddings for assessing the overall structure and exploring linear relationships (e.g., word analogies), respectively. Unfortunately, these techniques often produce mediocre or even misleading results and cannot address domain-specific visualization challenges that are crucial for understanding semantic relationships in word embeddings. Here, we introduce new embedding techniques for visualizing semantic and syntactic analogies, and the corresponding tests to determine whether the resulting views capture salient structures. Additionally, we introduce two novel views for a comprehensive study of analogy relationships. Finally, we augment t-SNE embeddings to convey uncertainty information in order to allow a reliable interpretation. Combined, the different views address a number of domain-specific tasks difficult to solve with existing tools.

摘要

通过神经语言模型为单词构建分布式表示,并将生成的向量空间用于分析,这已成为自然语言处理(NLP)的一个重要组成部分。然而,尽管它们得到了广泛的应用,但对于这些空间的结构和性质却知之甚少。为了深入了解单词之间的关系,NLP 社区已经开始采用高维可视化技术。特别是,研究人员通常使用 t 分布随机邻居嵌入(t-SNE)和主成分分析(PCA)分别创建二维嵌入,以评估整体结构和探索线性关系(例如,单词类比)。不幸的是,这些技术通常会产生平庸甚至误导性的结果,并且无法解决对于理解单词嵌入中的语义关系至关重要的特定于领域的可视化挑战。在这里,我们引入了用于可视化语义和句法类比的新嵌入技术,以及相应的测试来确定生成的视图是否捕获了显著的结构。此外,我们引入了两种新的视图,以全面研究类比关系。最后,我们增强了 t-SNE 嵌入以传达不确定性信息,以便能够进行可靠的解释。总之,不同的视图解决了许多特定于领域的任务,这些任务很难用现有工具来解决。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验