• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

主题模型的重归一化分析

Renormalization Analysis of Topic Models.

作者信息

Koltcov Sergei, Ignatenko Vera

机构信息

Laboratory for Social and Cognitive Informatics, National Research University Higher School of Economics, 55/2 Sedova St., 192148 St. Petersburg, Russia.

出版信息

Entropy (Basel). 2020 May 16;22(5):556. doi: 10.3390/e22050556.

DOI:10.3390/e22050556
PMID:33286328
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7517079/
Abstract

In practice, to build a machine learning model of big data, one needs to tune model parameters. The process of parameter tuning involves extremely time-consuming and computationally expensive grid search. However, the theory of statistical physics provides techniques allowing us to optimize this process. The paper shows that a function of the output of topic modeling demonstrates self-similar behavior under variation of the number of clusters. Such behavior allows using a renormalization technique. A combination of renormalization procedure with the Renyi entropy approach allows for quick searching of the optimal number of topics. In this paper, the renormalization procedure is developed for the probabilistic Latent Semantic Analysis (pLSA), and the Latent Dirichlet Allocation model with variational Expectation-Maximization algorithm (VLDA) and the Latent Dirichlet Allocation model with granulated Gibbs sampling procedure (GLDA). The experiments were conducted on two test datasets with a known number of topics in two different languages and on one unlabeled test dataset with an unknown number of topics. The paper shows that the renormalization procedure allows for finding an approximation of the optimal number of topics at least 30 times faster than the grid search without significant loss of quality.

摘要

在实践中,要构建大数据的机器学习模型,需要调整模型参数。参数调整过程涉及极其耗时且计算成本高昂的网格搜索。然而,统计物理理论提供了一些技术,使我们能够优化这一过程。本文表明,主题建模输出的一个函数在聚类数量变化时表现出自相似行为。这种行为允许使用重整化技术。重整化过程与雷尼熵方法相结合,可以快速搜索最优主题数量。在本文中,针对概率潜在语义分析(pLSA)、具有变分期望最大化算法的潜在狄利克雷分配模型(VLDA)以及具有颗粒吉布斯采样过程的潜在狄利克雷分配模型(GLDA)开发了重整化过程。实验在两个已知主题数量的不同语言测试数据集以及一个主题数量未知的未标记测试数据集上进行。本文表明,重整化过程能够以比网格搜索快至少30倍的速度找到最优主题数量的近似值,且质量不会有显著损失。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b36/7517079/12668f0a639e/entropy-22-00556-g017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b36/7517079/5271d2753b6d/entropy-22-00556-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b36/7517079/3daaeb1b156b/entropy-22-00556-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b36/7517079/f47127a8d15b/entropy-22-00556-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b36/7517079/a0609b77e5c6/entropy-22-00556-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b36/7517079/2202ae11a173/entropy-22-00556-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b36/7517079/662f97ea0213/entropy-22-00556-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b36/7517079/e4d3d3808ca0/entropy-22-00556-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b36/7517079/ca6accb811d0/entropy-22-00556-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b36/7517079/3a04446e8691/entropy-22-00556-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b36/7517079/27efa4afe903/entropy-22-00556-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b36/7517079/bf0d49dd1c2c/entropy-22-00556-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b36/7517079/b8164bc1b785/entropy-22-00556-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b36/7517079/8a1d30db723f/entropy-22-00556-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b36/7517079/f94a36dd3263/entropy-22-00556-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b36/7517079/094259029359/entropy-22-00556-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b36/7517079/852b08379032/entropy-22-00556-g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b36/7517079/12668f0a639e/entropy-22-00556-g017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b36/7517079/5271d2753b6d/entropy-22-00556-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b36/7517079/3daaeb1b156b/entropy-22-00556-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b36/7517079/f47127a8d15b/entropy-22-00556-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b36/7517079/a0609b77e5c6/entropy-22-00556-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b36/7517079/2202ae11a173/entropy-22-00556-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b36/7517079/662f97ea0213/entropy-22-00556-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b36/7517079/e4d3d3808ca0/entropy-22-00556-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b36/7517079/ca6accb811d0/entropy-22-00556-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b36/7517079/3a04446e8691/entropy-22-00556-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b36/7517079/27efa4afe903/entropy-22-00556-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b36/7517079/bf0d49dd1c2c/entropy-22-00556-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b36/7517079/b8164bc1b785/entropy-22-00556-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b36/7517079/8a1d30db723f/entropy-22-00556-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b36/7517079/f94a36dd3263/entropy-22-00556-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b36/7517079/094259029359/entropy-22-00556-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b36/7517079/852b08379032/entropy-22-00556-g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b36/7517079/12668f0a639e/entropy-22-00556-g017.jpg

相似文献

1
Renormalization Analysis of Topic Models.主题模型的重归一化分析
Entropy (Basel). 2020 May 16;22(5):556. doi: 10.3390/e22050556.
2
Analyzing the Influence of Hyper-parameters and Regularizers of Topic Modeling in Terms of Renyi Entropy.从雷尼熵的角度分析主题模型超参数和正则化器的影响。
Entropy (Basel). 2020 Mar 30;22(4):394. doi: 10.3390/e22040394.
3
Estimating Topic Modeling Performance with Sharma-Mittal Entropy.用夏尔马-米塔尔熵估计主题建模性能。
Entropy (Basel). 2019 Jul 5;21(7):660. doi: 10.3390/e21070660.
4
Analysis and tuning of hierarchical topic models based on Renyi entropy approach.基于雷尼熵方法的层次主题模型分析与调优
PeerJ Comput Sci. 2021 Jul 29;7:e608. doi: 10.7717/peerj-cs.608. eCollection 2021.
5
Gaussian hierarchical latent Dirichlet allocation: Bringing polysemy back.高斯层次潜在狄利克雷分配:使多义性回归。
PLoS One. 2023 Jul 12;18(7):e0288274. doi: 10.1371/journal.pone.0288274. eCollection 2023.
6
Topic models with elements of neural networks: investigation of stability, coherence, and determining the optimal number of topics.
PeerJ Comput Sci. 2024 Jan 3;10:e1758. doi: 10.7717/peerj-cs.1758. eCollection 2024.
7
Predicting protein-protein relationships from literature using latent topics.利用潜在主题从文献中预测蛋白质-蛋白质关系。
Genome Inform. 2009 Oct;23(1):3-12.
8
Investigating the Efficient Use of Word Embedding with Neural-Topic Models for Interpretable Topics from Short Texts.研究基于神经主题模型的词向量有效利用,以实现短文本的可解释主题。
Sensors (Basel). 2022 Jan 23;22(3):852. doi: 10.3390/s22030852.
9
[The hierarchical clustering analysis of hyperspectral image based on probabilistic latent semantic analysis].基于概率潜在语义分析的高光谱图像层次聚类分析
Guang Pu Xue Yu Guang Pu Fen Xi. 2011 Sep;31(9):2471-5.
10
Evaluation of clustering and topic modeling methods over health-related tweets and emails.健康相关推文和电子邮件的聚类和主题建模方法评估。
Artif Intell Med. 2021 Jul;117:102096. doi: 10.1016/j.artmed.2021.102096. Epub 2021 May 7.

引用本文的文献

1
Selection of the Optimal Number of Topics for LDA Topic Model-Taking Patent Policy Analysis as an Example.LDA主题模型最优主题数量的选择——以专利政策分析为例
Entropy (Basel). 2021 Oct 3;23(10):1301. doi: 10.3390/e23101301.

本文引用的文献

1
Analyzing the Influence of Hyper-parameters and Regularizers of Topic Modeling in Terms of Renyi Entropy.从雷尼熵的角度分析主题模型超参数和正则化器的影响。
Entropy (Basel). 2020 Mar 30;22(4):394. doi: 10.3390/e22040394.
2
Estimating Topic Modeling Performance with Sharma-Mittal Entropy.用夏尔马-米塔尔熵估计主题建模性能。
Entropy (Basel). 2019 Jul 5;21(7):660. doi: 10.3390/e21070660.
3
Thermodynamics and signatures of criticality in a network of neurons.神经元网络中的热力学与临界特征
Proc Natl Acad Sci U S A. 2015 Sep 15;112(37):11508-13. doi: 10.1073/pnas.1514188112. Epub 2015 Sep 1.
4
Finding scientific topics.寻找科学主题。
Proc Natl Acad Sci U S A. 2004 Apr 6;101 Suppl 1(Suppl 1):5228-35. doi: 10.1073/pnas.0307752101. Epub 2004 Feb 10.
5
Fractal measures and their singularities: The characterization of strange sets.分形测度及其奇点:奇异集的刻画
Phys Rev A Gen Phys. 1986 Feb;33(2):1141-1151. doi: 10.1103/physreva.33.1141.