• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Nonparametric Spherical Topic Modeling with Word Embeddings.基于词嵌入的非参数球面主题模型
Proc Conf Assoc Comput Linguist Meet. 2016 Aug;2016:537-542. doi: 10.18653/v1/P16-2087.
2
Combining Knowledge Graph and Word Embeddings for Spherical Topic Modeling.结合知识图谱和词嵌入进行球形主题建模。
IEEE Trans Neural Netw Learn Syst. 2023 Jul;34(7):3609-3623. doi: 10.1109/TNNLS.2021.3112045. Epub 2023 Jul 6.
3
Unsupervised low-dimensional vector representations for words, phrases and text that are transparent, scalable, and produce similarity metrics that are not redundant with neural embeddings.用于单词、短语和文本的无监督低维向量表示,具有透明性、可扩展性,并能产生与神经嵌入不冗余的相似性度量。
J Biomed Inform. 2019 Feb;90:103096. doi: 10.1016/j.jbi.2019.103096. Epub 2019 Jan 14.
4
A Method of Short Text Representation Fusion with Weighted Word Embeddings and Extended Topic Information.一种基于加权词嵌入和扩展主题信息的短文本表示融合方法。
Sensors (Basel). 2022 Jan 29;22(3):1066. doi: 10.3390/s22031066.
5
Jointly learning word embeddings using a corpus and a knowledge base.联合使用语料库和知识库学习词向量。
PLoS One. 2018 Mar 12;13(3):e0193094. doi: 10.1371/journal.pone.0193094. eCollection 2018.
6
Gaussian hierarchical latent Dirichlet allocation: Bringing polysemy back.高斯层次潜在狄利克雷分配:使多义性回归。
PLoS One. 2023 Jul 12;18(7):e0288274. doi: 10.1371/journal.pone.0288274. eCollection 2023.
7
Knowledge-Based Topic Model for Unsupervised Object Discovery and Localization.基于知识的无监督目标发现和定位主题模型。
IEEE Trans Image Process. 2018;27(1):50-63. doi: 10.1109/TIP.2017.2718667.
8
A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。
J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.
9
Topic models with elements of neural networks: investigation of stability, coherence, and determining the optimal number of topics.
PeerJ Comput Sci. 2024 Jan 3;10:e1758. doi: 10.7717/peerj-cs.1758. eCollection 2024.
10
Latent Topic Text Representation Learning on Statistical Manifolds.统计流形上的潜在主题文本表示学习
IEEE Trans Neural Netw Learn Syst. 2018 Nov;29(11):5643-5654. doi: 10.1109/TNNLS.2018.2808332. Epub 2018 Mar 16.

引用本文的文献

1
Topic models with elements of neural networks: investigation of stability, coherence, and determining the optimal number of topics.
PeerJ Comput Sci. 2024 Jan 3;10:e1758. doi: 10.7717/peerj-cs.1758. eCollection 2024.
2
Gaussian hierarchical latent Dirichlet allocation: Bringing polysemy back.高斯层次潜在狄利克雷分配:使多义性回归。
PLoS One. 2023 Jul 12;18(7):e0288274. doi: 10.1371/journal.pone.0288274. eCollection 2023.

本文引用的文献

1
Nested Hierarchical Dirichlet Processes.嵌套层次狄利克雷过程。
IEEE Trans Pattern Anal Mach Intell. 2015 Feb;37(2):256-70. doi: 10.1109/TPAMI.2014.2318728.

基于词嵌入的非参数球面主题模型

Nonparametric Spherical Topic Modeling with Word Embeddings.

作者信息

Batmanghelich Kayhan, Saeedi Ardavan, Narasimhan Karthik, Gershman Sam

机构信息

CSAIL, MIT.

Harvard University.

出版信息

Proc Conf Assoc Comput Linguist Meet. 2016 Aug;2016:537-542. doi: 10.18653/v1/P16-2087.

DOI:10.18653/v1/P16-2087
PMID:30636838
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6327958/
Abstract

Traditional topic models do not account for semantic regularities in language. Recent distributional representations of words exhibit semantic consistency over directional metrics such as cosine similarity. However, neither categorical nor Gaussian observational distributions used in existing topic models are appropriate to leverage such correlations. In this paper, we propose to use the von Mises-Fisher distribution to model the density of words over a unit sphere. Such a representation is well-suited for directional data. We use a Hierarchical Dirichlet Process for our base topic model and propose an efficient inference algorithm based on Stochastic Variational Inference. This model enables us to naturally exploit the semantic structures of word embeddings while flexibly discovering the number of topics. Experiments demonstrate that our method outperforms competitive approaches in terms of topic coherence on two different text corpora while offering efficient inference.

摘要

传统的主题模型没有考虑语言中的语义规律。最近的词分布表示在诸如余弦相似度等方向度量上表现出语义一致性。然而,现有主题模型中使用的分类观测分布和高斯观测分布都不适用于利用这种相关性。在本文中,我们建议使用冯·米塞斯-费舍尔分布来对单位球面上词的密度进行建模。这样的表示非常适合方向数据。我们将分层狄利克雷过程用于我们的基础主题模型,并基于随机变分推断提出了一种高效的推理算法。该模型使我们能够自然地利用词嵌入的语义结构,同时灵活地发现主题数量。实验表明,我们的方法在两个不同的文本语料库上的主题连贯性方面优于竞争方法,同时提供了高效的推理。