Suppr超能文献

LDA、NMF、Top2Vec和BERTopic用于揭秘推特帖子的主题建模比较

A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts.

作者信息

Egger Roman, Yu Joanne

机构信息

Innovation and Management in Tourism, Salzburg University of Applied Sciences, Salzburg, Austria.

Department of Tourism and Service Management, Modul University Vienna, Vienna, Austria.

出版信息

Front Sociol. 2022 May 6;7:886498. doi: 10.3389/fsoc.2022.886498. eCollection 2022.

Abstract

The richness of social media data has opened a new avenue for social science research to gain insights into human behaviors and experiences. In particular, emerging data-driven approaches relying on topic models provide entirely new perspectives on interpreting social phenomena. However, the short, text-heavy, and unstructured nature of social media content often leads to methodological challenges in both data collection and analysis. In order to bridge the developing field of computational science and empirical social research, this study aims to evaluate the performance of four topic modeling techniques; namely latent Dirichlet allocation (LDA), non-negative matrix factorization (NMF), Top2Vec, and BERTopic. In view of the interplay between human relations and digital media, this research takes Twitter posts as the reference point and assesses the performance of different algorithms concerning their strengths and weaknesses in a social science context. Based on certain details during the analytical procedures and on quality issues, this research sheds light on the efficacy of using BERTopic and NMF to analyze Twitter data.

摘要

社交媒体数据的丰富性为社会科学研究开辟了一条新途径,以便深入了解人类行为和经历。特别是,依赖主题模型的新兴数据驱动方法为解释社会现象提供了全新的视角。然而,社交媒体内容简短、文本量大且无结构的特点,常常在数据收集和分析方面带来方法上的挑战。为了弥合计算科学发展领域与实证社会研究之间的差距,本研究旨在评估四种主题建模技术的性能;即潜在狄利克雷分配(LDA)、非负矩阵分解(NMF)、Top2Vec和BERTopic。鉴于人际关系与数字媒体之间的相互作用,本研究以推特帖子为参考点,评估不同算法在社会科学背景下的优缺点。基于分析过程中的某些细节和质量问题,本研究揭示了使用BERTopic和NMF分析推特数据的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1ba/9120935/8055fe17eecd/fsoc-07-886498-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验