推特数据采样如何使对美国选民行为的描述产生偏差。

How Twitter data sampling biases U.S. voter behavior characterizations.

作者信息

Yang Kai-Cheng, Hui Pik-Mai, Menczer Filippo

机构信息

Observatory on Social Media, Indiana University, Bloomington, Indiana, United States.

出版信息

PeerJ Comput Sci. 2022 Jul 1;8:e1025. doi: 10.7717/peerj-cs.1025. eCollection 2022.

DOI:10.7717/peerj-cs.1025

PMID:35875635

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9299280/

Abstract

Online social media are key platforms for the public to discuss political issues. As a result, researchers have used data from these platforms to analyze public opinions and forecast election results. The literature has shown that due to inauthentic actors such as malicious social bots and trolls, not every message is a genuine expression from a legitimate user. However, the prevalence of inauthentic activities in social data streams is still unclear, making it difficult to gauge biases of analyses based on such data. In this article, we aim to close this gap using Twitter data from the 2018 U.S. midterm elections. We propose an efficient and low-cost method to identify voters on Twitter and systematically compare their behaviors with different random samples of accounts. We find that some accounts flood the public data stream with political content, drowning the voice of the majority of voters. As a result, these hyperactive accounts are over-represented in volume samples. Hyperactive accounts are more likely to exhibit various suspicious behaviors and to share low-credibility information compared to likely voters. Our work provides insights into biased voter characterizations when using social media data to analyze political issues.

摘要

在线社交媒体是公众讨论政治问题的关键平台。因此，研究人员利用这些平台的数据来分析公众舆论并预测选举结果。文献表明，由于存在恶意社交机器人和网络喷子等虚假行为主体，并非每条信息都是合法用户的真实表达。然而，社交数据流中虚假活动的普遍程度仍不明确，这使得难以衡量基于此类数据的分析偏差。在本文中，我们旨在利用2018年美国中期选举的推特数据来填补这一空白。我们提出一种高效且低成本的方法来识别推特上的选民，并系统地将他们的行为与不同的随机账户样本进行比较。我们发现，一些账户用政治内容充斥公共数据流，淹没了大多数选民的声音。结果，这些活跃过度的账户在数量样本中占比过高。与可能的选民相比，活跃过度的账户更有可能表现出各种可疑行为并分享可信度低的信息。我们的工作为使用社交媒体数据分析政治问题时存在偏差的选民特征刻画提供了见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6920/9299280/e09cb82af8d4/peerj-cs-08-1025-g001.jpg

相似文献

How Twitter data sampling biases U.S. voter behavior characterizations.推特数据采样如何使对美国选民行为的描述产生偏差。

PeerJ Comput Sci. 2022 Jul 1;8:e1025. doi: 10.7717/peerj-cs.1025. eCollection 2022.

Brexit and bots: characterizing the behaviour of automated accounts on Twitter during the UK election.英国脱欧与机器人程序：剖析英国大选期间推特上自动账户的行为特征

EPJ Data Sci. 2022;11(1):17. doi: 10.1140/epjds/s13688-022-00330-0. Epub 2022 Mar 22.

Who polarizes Twitter? Ideological polarization, partisan groups and strategic networked campaigning on Twitter during the 2017 and 2021 German Federal elections 'Bundestagswahlen'.谁在使推特两极分化？2017年和2021年德国联邦议院选举期间推特上的意识形态两极分化、党派团体与策略性网络竞选活动

Soc Netw Anal Min. 2022;12(1):151. doi: 10.1007/s13278-022-00958-w. Epub 2022 Oct 11.

Neutral bots probe political bias on social media.中立机器人探测社交媒体上的政治偏见。

Nat Commun. 2021 Sep 22;12(1):5580. doi: 10.1038/s41467-021-25738-6.

Malicious Actors on Twitter: A Guide for Public Health Researchers.推特上的恶意行为者：公共卫生研究人员指南。

Am J Public Health. 2019 May;109(5):688-692. doi: 10.2105/AJPH.2019.304969. Epub 2019 Mar 21.

Influence of online searches for campaign messages on voting behaviour in Ghana.加纳在线搜索竞选信息对投票行为的影响。

Heliyon. 2024 May 16;10(10):e31114. doi: 10.1016/j.heliyon.2024.e31114. eCollection 2024 May 30.

Understanding Malicious Accounts in Online Political Discussions: A Multilayer Network Approach.理解网络政治讨论中的恶意账户：一种多层网络方法。

Sensors (Basel). 2021 Mar 20;21(6):2183. doi: 10.3390/s21062183.

280 characters to the White House: predicting 2020 U.S. presidential elections from twitter data.致白宫的280个字符：通过推特数据预测2020年美国总统大选

Comput Math Organ Theory. 2023 Mar 28:1-28. doi: 10.1007/s10588-023-09376-5.

Social Security: a financial appraisal for the median voter.社会保障：对中位选民的财务评估。

Soc Secur Bull. 2001;64(2):57-65.

Platform Effects on Public Health Communication: A Comparative and National Study of Message Design and Audience Engagement Across Twitter and Facebook.平台对公共卫生传播的影响：一项关于推特和脸书上信息设计与受众参与度的比较性全国研究。

JMIR Infodemiology. 2022 Dec 20;2(2):e40198. doi: 10.2196/40198. eCollection 2022 Jul-Dec.

引用本文的文献

Who Tweets for the autistic community? A natural language processing-driven investigation.谁为自闭症群体发声？一项基于自然语言处理的调查。

Autism. 2025 Jul;29(7):1740-1753. doi: 10.1177/13623613251325934. Epub 2025 Mar 25.

Comparing methods for creating a national random sample of twitter users.比较创建推特用户全国随机样本的方法。

Soc Netw Anal Min. 2024;14(1):160. doi: 10.1007/s13278-024-01327-5. Epub 2024 Aug 14.

Special issue on analysis and mining of social media data.社交媒体数据分析与挖掘特刊。

PeerJ Comput Sci. 2024 Feb 29;10:e1909. doi: 10.7717/peerj-cs.1909. eCollection 2024.

本文引用的文献

Neutral bots probe political bias on social media.中立机器人探测社交媒体上的政治偏见。

Nat Commun. 2021 Sep 22;12(1):5580. doi: 10.1038/s41467-021-25738-6.

Fighting misinformation on social media using crowdsourced judgments of news source quality.利用众包新闻来源质量判断来打击社交媒体上的错误信息。

Proc Natl Acad Sci U S A. 2019 Feb 12;116(7):2521-2526. doi: 10.1073/pnas.1806781116. Epub 2019 Jan 28.

Fake news on Twitter during the 2016 U.S. presidential election.2016年美国总统大选期间推特上的假新闻。

Science. 2019 Jan 25;363(6425):374-378. doi: 10.1126/science.aau2706.

Less than you think: Prevalence and predictors of fake news dissemination on Facebook.远低于你的想象：脸书上虚假新闻传播的流行程度和预测因素。

Sci Adv. 2019 Jan 9;5(1):eaau4586. doi: 10.1126/sciadv.aau4586. eCollection 2019 Jan.

Influence of fake news in Twitter during the 2016 US presidential election.推特上 2016 年美国总统大选期间假新闻的影响。

Nat Commun. 2019 Jan 2;10(1):7. doi: 10.1038/s41467-018-07761-2.

The spread of low-credibility content by social bots.社交机器人传播低可信度内容。

Nat Commun. 2018 Nov 20;9(1):4787. doi: 10.1038/s41467-018-06930-7.

Bots increase exposure to negative and inflammatory content in online social systems.机器人增加了在线社交系统中负面和煽动性内容的曝光率。

Proc Natl Acad Sci U S A. 2018 Dec 4;115(49):12435-12440. doi: 10.1073/pnas.1803470115. Epub 2018 Nov 20.

How algorithmic popularity bias hinders or promotes quality.算法流行度偏差如何阻碍或促进质量。

Sci Rep. 2018 Oct 29;8(1):15951. doi: 10.1038/s41598-018-34203-2.

The spread of true and false news online.网络上真实和虚假新闻的传播。

Science. 2018 Mar 9;359(6380):1146-1151. doi: 10.1126/science.aap9559.

Political science. Exposure to ideologically diverse news and opinion on Facebook.政治学。在 Facebook 上接触意识形态多样的新闻和观点。

Science. 2015 Jun 5;348(6239):1130-2. doi: 10.1126/science.aaa1160. Epub 2015 May 7.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

推特数据采样如何使对美国选民行为的描述产生偏差。

How Twitter data sampling biases U.S. voter behavior characterizations.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献