• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

自动化 Bot 检测在社会科学研究中的假阳性问题。

The False positive problem of automatic bot detection in social science research.

机构信息

Graduate Institute of Journalism, National Taiwan University, Taipei, Taiwan (R.O.C.).

Communication, Journalism, & Media Department, Suffolk University, Boston, Massachusetts, United States of America.

出版信息

PLoS One. 2020 Oct 22;15(10):e0241045. doi: 10.1371/journal.pone.0241045. eCollection 2020.

DOI:10.1371/journal.pone.0241045
PMID:33091067
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7580919/
Abstract

The identification of bots is an important and complicated task. The bot classifier "Botometer" was successfully introduced as a way to estimate the number of bots in a given list of accounts and, as a consequence, has been frequently used in academic publications. Given its relevance for academic research and our understanding of the presence of automated accounts in any given Twitter discourse, we are interested in Botometer's diagnostic ability over time. To do so, we collected the Botometer scores for five datasets (three verified as bots, two verified as human; n = 4,134) in two languages (English/German) over three months. We show that the Botometer scores are imprecise when it comes to estimating bots; especially in a different language. We further show in an analysis of Botometer scores over time that Botometer's thresholds, even when used very conservatively, are prone to variance, which, in turn, will lead to false negatives (i.e., bots being classified as humans) and false positives (i.e., humans being classified as bots). This has immediate consequences for academic research as most studies in social science using the tool will unknowingly count a high number of human users as bots and vice versa. We conclude our study with a discussion about how computational social scientists should evaluate machine learning systems that are developed for identifying bots.

摘要

识别机器人是一项重要而复杂的任务。机器人分类器“Botometer”已被成功引入,用于估计给定账户列表中的机器人数量,因此在学术出版物中经常被使用。鉴于其对学术研究的重要性以及我们对任何给定 Twitter 话语中自动化账户的存在的理解,我们对 Botometer 的诊断能力随时间的变化感兴趣。为此,我们在三个月内收集了五个数据集(三个经证实为机器人,两个经证实为人;n=4134)的 Botometer 分数,这两个数据集分别使用两种语言(英语/德语)。我们表明,Botometer 分数在估计机器人时不够精确;尤其是在不同的语言环境中。我们进一步在 Botometer 分数随时间的分析中表明,Botometer 的阈值,即使使用非常保守的方式,也容易出现变化,这反过来又会导致假阴性(即机器人被错误地归类为人类)和假阳性(即人类被错误地归类为机器人)。这对学术研究产生了直接影响,因为使用该工具的大多数社会科学研究将无意识地将大量的人类用户错误地归类为机器人,反之亦然。我们在讨论中结束了我们的研究,讨论了计算社会科学家应该如何评估为识别机器人而开发的机器学习系统。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdc4/7580919/1797337f7a6b/pone.0241045.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdc4/7580919/747fa5c52477/pone.0241045.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdc4/7580919/74f1a3a69fed/pone.0241045.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdc4/7580919/2fc33d3ca061/pone.0241045.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdc4/7580919/2d931a6c332b/pone.0241045.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdc4/7580919/c019191cba60/pone.0241045.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdc4/7580919/3f237cd1fe66/pone.0241045.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdc4/7580919/4b7628173447/pone.0241045.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdc4/7580919/1797337f7a6b/pone.0241045.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdc4/7580919/747fa5c52477/pone.0241045.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdc4/7580919/74f1a3a69fed/pone.0241045.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdc4/7580919/2fc33d3ca061/pone.0241045.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdc4/7580919/2d931a6c332b/pone.0241045.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdc4/7580919/c019191cba60/pone.0241045.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdc4/7580919/3f237cd1fe66/pone.0241045.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdc4/7580919/4b7628173447/pone.0241045.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdc4/7580919/1797337f7a6b/pone.0241045.g008.jpg

相似文献

1
The False positive problem of automatic bot detection in social science research.自动化 Bot 检测在社会科学研究中的假阳性问题。
PLoS One. 2020 Oct 22;15(10):e0241045. doi: 10.1371/journal.pone.0241045. eCollection 2020.
2
Botometer 101: social bot practicum for computational social scientists.Botometer 101:面向计算社会科学家的社交机器人实践
J Comput Soc Sci. 2022;5(2):1511-1528. doi: 10.1007/s42001-022-00177-5. Epub 2022 Aug 20.
3
Insights into elections: An ensemble bot detection coverage framework applied to the 2018 U.S. midterm elections.选举洞察:适用于 2018 年美国中期选举的集成机器人检测覆盖框架。
PLoS One. 2021 Jan 6;16(1):e0244309. doi: 10.1371/journal.pone.0244309. eCollection 2021.
4
Detecting Bots on Russian Political Twitter.检测俄罗斯政治推特上的机器人。
Big Data. 2017 Dec;5(4):310-324. doi: 10.1089/big.2017.0038.
5
Assessing the Role of Social Bots During the COVID-19 Pandemic: Infodemic, Disagreement, and Criticism.评估新冠疫情期间社交机器人的作用:信息疫情、分歧和批评。
J Med Internet Res. 2022 Aug 25;24(8):e36085. doi: 10.2196/36085.
6
Public Opinion Manipulation on Social Media: Social Network Analysis of Twitter Bots during the COVID-19 Pandemic.社交媒体上的舆论操纵:新冠疫情期间推特机器人的社交网络分析。
Int J Environ Res Public Health. 2022 Dec 7;19(24):16376. doi: 10.3390/ijerph192416376.
7
Detection and impact estimation of social bots in the Chilean Twitter network.检测和估计智利 Twitter 网络中的社交机器人。
Sci Rep. 2024 Mar 19;14(1):6525. doi: 10.1038/s41598-024-57227-3.
8
Bots increase exposure to negative and inflammatory content in online social systems.机器人增加了在线社交系统中负面和煽动性内容的曝光率。
Proc Natl Acad Sci U S A. 2018 Dec 4;115(49):12435-12440. doi: 10.1073/pnas.1803470115. Epub 2018 Nov 20.
9
Social Bots' Role in the COVID-19 Pandemic Discussion on Twitter.社交媒体机器人在推特上关于 COVID-19 大流行讨论中的作用。
Int J Environ Res Public Health. 2023 Feb 13;20(4):3284. doi: 10.3390/ijerph20043284.
10
Social Bots: Human-Like by Means of Human Control?社交机器人:通过人为控制实现类人化?
Big Data. 2017 Dec;5(4):279-293. doi: 10.1089/big.2017.0044.

引用本文的文献

1
Differential impact from individual versus collective misinformation tagging on the diversity of Twitter (X) information engagement and mobility.个体与集体错误信息标记对推特(X)信息参与度和传播多样性的差异影响。
Nat Commun. 2025 Jan 24;16(1):973. doi: 10.1038/s41467-025-55868-0.
2
Social bots spoil activist sentiment without eroding engagement.社交机器人在不削弱参与度的情况下破坏了活动家的情绪。
Sci Rep. 2024 Nov 6;14(1):27005. doi: 10.1038/s41598-024-74032-0.
3
Understanding anti-immigration sentiment spreading on Twitter.理解推特上反移民情绪的蔓延。

本文引用的文献

1
The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation.马修斯相关系数(MCC)在二分类评估中优于 F1 得分和准确率的优势。
BMC Genomics. 2020 Jan 2;21(1):6. doi: 10.1186/s12864-019-6413-7.
2
The spread of low-credibility content by social bots.社交机器人传播低可信度内容。
Nat Commun. 2018 Nov 20;9(1):4787. doi: 10.1038/s41467-018-06930-7.
3
The spread of true and false news online.网络上真实和虚假新闻的传播。
PLoS One. 2024 Sep 4;19(9):e0307917. doi: 10.1371/journal.pone.0307917. eCollection 2024.
4
Detection and impact estimation of social bots in the Chilean Twitter network.检测和估计智利 Twitter 网络中的社交机器人。
Sci Rep. 2024 Mar 19;14(1):6525. doi: 10.1038/s41598-024-57227-3.
5
Patterns of human and bots behaviour on Twitter conversations about sustainability.Twitter 上关于可持续性的对话中人类和机器人行为模式。
Sci Rep. 2024 Feb 8;14(1):3223. doi: 10.1038/s41598-024-52471-z.
6
Overcome the fragmentation in online propaganda literature: the role of cultural and cognitive sociology.克服网络宣传文献中的碎片化:文化与认知社会学的作用。
Front Sociol. 2023 Jul 11;8:1170447. doi: 10.3389/fsoc.2023.1170447. eCollection 2023.
7
Public Opinion Manipulation on Social Media: Social Network Analysis of Twitter Bots during the COVID-19 Pandemic.社交媒体上的舆论操纵:新冠疫情期间推特机器人的社交网络分析。
Int J Environ Res Public Health. 2022 Dec 7;19(24):16376. doi: 10.3390/ijerph192416376.
8
Social media analysis of Twitter tweets related to ASD in 2019-2020, with particular attention to COVID-19: topic modelling and sentiment analysis.2019 - 2020年与自闭症谱系障碍(ASD)相关的推特推文的社交媒体分析,特别关注2019冠状病毒病(COVID - 19):主题建模与情感分析。
J Big Data. 2022;9(1):113. doi: 10.1186/s40537-022-00666-4. Epub 2022 Nov 25.
9
The Influence of Provaping "Gatewatchers" on the Dissemination of COVID-19 Misinformation on Twitter: Analysis of Twitter Discourse Regarding Nicotine and the COVID-19 Pandemic.“伪电子烟监管者”对新冠疫情错误信息在推特上传播的影响:关于尼古丁和新冠大流行的推特话语分析。
J Med Internet Res. 2022 Sep 22;24(9):e40331. doi: 10.2196/40331.
10
Botometer 101: social bot practicum for computational social scientists.Botometer 101:面向计算社会科学家的社交机器人实践
J Comput Soc Sci. 2022;5(2):1511-1528. doi: 10.1007/s42001-022-00177-5. Epub 2022 Aug 20.
Science. 2018 Mar 9;359(6380):1146-1151. doi: 10.1126/science.aap9559.
4
Ten quick tips for machine learning in computational biology.计算生物学中机器学习的十条快速提示。
BioData Min. 2017 Dec 8;10:35. doi: 10.1186/s13040-017-0155-3. eCollection 2017.
5
The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets.在不平衡数据集上评估二元分类器时,精确率-召回率曲线比ROC曲线更具信息性。
PLoS One. 2015 Mar 4;10(3):e0118432. doi: 10.1371/journal.pone.0118432. eCollection 2015.
6
Disadvantages of using the area under the receiver operating characteristic curve to assess imaging tests: a discussion and proposal for an alternative approach.使用受试者工作特征曲线下面积评估成像检查的缺点:一种替代方法的讨论与建议
Eur Radiol. 2015 Apr;25(4):932-9. doi: 10.1007/s00330-014-3487-0. Epub 2015 Jan 20.
7
Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach.比较两条或多条相关的受试者工作特征曲线下的面积:一种非参数方法。
Biometrics. 1988 Sep;44(3):837-45.