Suppr超能文献

LDA、NMF、Top2Vec和BERTopic用于揭秘推特帖子的主题建模比较

A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts.

作者信息

Egger Roman, Yu Joanne

机构信息

Innovation and Management in Tourism, Salzburg University of Applied Sciences, Salzburg, Austria.

Department of Tourism and Service Management, Modul University Vienna, Vienna, Austria.

出版信息

Front Sociol. 2022 May 6;7:886498. doi: 10.3389/fsoc.2022.886498. eCollection 2022.

Abstract

The richness of social media data has opened a new avenue for social science research to gain insights into human behaviors and experiences. In particular, emerging data-driven approaches relying on topic models provide entirely new perspectives on interpreting social phenomena. However, the short, text-heavy, and unstructured nature of social media content often leads to methodological challenges in both data collection and analysis. In order to bridge the developing field of computational science and empirical social research, this study aims to evaluate the performance of four topic modeling techniques; namely latent Dirichlet allocation (LDA), non-negative matrix factorization (NMF), Top2Vec, and BERTopic. In view of the interplay between human relations and digital media, this research takes Twitter posts as the reference point and assesses the performance of different algorithms concerning their strengths and weaknesses in a social science context. Based on certain details during the analytical procedures and on quality issues, this research sheds light on the efficacy of using BERTopic and NMF to analyze Twitter data.

摘要

社交媒体数据的丰富性为社会科学研究开辟了一条新途径,以便深入了解人类行为和经历。特别是,依赖主题模型的新兴数据驱动方法为解释社会现象提供了全新的视角。然而,社交媒体内容简短、文本量大且无结构的特点,常常在数据收集和分析方面带来方法上的挑战。为了弥合计算科学发展领域与实证社会研究之间的差距,本研究旨在评估四种主题建模技术的性能;即潜在狄利克雷分配(LDA)、非负矩阵分解(NMF)、Top2Vec和BERTopic。鉴于人际关系与数字媒体之间的相互作用,本研究以推特帖子为参考点,评估不同算法在社会科学背景下的优缺点。基于分析过程中的某些细节和质量问题,本研究揭示了使用BERTopic和NMF分析推特数据的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1ba/9120935/8055fe17eecd/fsoc-07-886498-g0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验