Suppr超能文献

用于简短且有噪声的用户生成数据的迭代主题模型过滤框架:分析推特上的阴谋论

An iterative topic model filtering framework for short and noisy user-generated data: analyzing conspiracy theories on twitter.

作者信息

Kant Gillian, Wiebelt Levin, Weisser Christoph, Kis-Katos Krisztina, Luber Mattias, Säfken Benjamin

机构信息

University of Göttingen, Göttingen, Germany.

Campus-Institut Data Science (CIDAS), University of Göttingen, Göttingen, Germany.

出版信息

Int J Data Sci Anal. 2022 May 6:1-21. doi: 10.1007/s41060-022-00321-4.

Abstract

Conspiracy theories have seen a rise in popularity in recent years. Spreading quickly through social media, their disruptive effect can lead to a biased public view on policy decisions and events. We present a novel approach for LDA-pre-processing called Iterative Filtering to study such phenomena based on Twitter data. In combination with Hashtag Pooling as an additional pre-processing step, we are able to achieve a coherent framing of the discussion and topics of interest, despite of the inherent noisiness and sparseness of Twitter data. Our novel approach enables researchers to gain detailed insights into discourses of interest on Twitter, allowing them to identify tweets iteratively that are related to an investigated topic of interest. As an application, we study the dynamics of conspiracy-related topics on US Twitter during the last four months of 2020, which were dominated by the US-Presidential Elections and Covid-19. We monitor the public discourse in the USA with geo-spatial Twitter data to identify conspiracy-related contents by estimating Latent Dirichlet Allocation (LDA) Topic Models. We find that in this period, usual conspiracy-related topics played a marginal role in comparison with dominating topics, such as the US-Presidential Elections or the general discussions about Covid-19. The main conspiracy theories in this period were the ones linked to "Election Fraud" and the "Covid-19-hoax." Conspiracy-related keywords tended to appear together with Trump-related words and words related to his presidential campaign.

摘要

近年来,阴谋论越来越流行。它们通过社交媒体迅速传播,其破坏作用可能导致公众对政策决策和事件产生偏见性看法。我们提出了一种名为迭代过滤的LDA预处理新方法,以基于推特数据研究此类现象。结合标签池化作为额外的预处理步骤,尽管推特数据存在固有的噪声和稀疏性,我们仍能够实现对讨论和感兴趣话题的连贯框架构建。我们的新方法使研究人员能够深入了解推特上感兴趣的话语,使他们能够迭代地识别与所研究的感兴趣话题相关的推文。作为一个应用,我们研究了2020年最后四个月美国推特上与阴谋相关话题的动态,这期间主要受美国总统选举和新冠疫情的影响。我们使用地理空间推特数据监测美国的公众话语,通过估计潜在狄利克雷分配(LDA)主题模型来识别与阴谋相关的内容。我们发现,在此期间,与占主导地位的话题(如美国总统选举或关于新冠疫情的一般讨论)相比,常见的与阴谋相关的话题只起了边缘作用。这一时期的主要阴谋论是与“选举欺诈”和“新冠疫情骗局”相关的理论。与阴谋相关的关键词往往与与特朗普相关的词汇以及与他总统竞选相关的词汇一同出现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c56c/9072765/876381765b0b/41060_2022_321_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验