Suppr超能文献

社交媒体分析中的数据和模型偏差:以 COVID-19 推文为例。

Data and Model Biases in Social Media Analyses: A Case Study of COVID-19 Tweets.

机构信息

University of Florida, Gainesville, Florida, USA.

University of Texas Health Science Center at Houston, Houston, Texas, USA.

出版信息

AMIA Annu Symp Proc. 2022 Feb 21;2021:1264-1273. eCollection 2021.

Abstract

During the coronavirus disease pandemic (COVID-19), social media platforms such as Twitter have become a venue for individuals, health professionals, and government agencies to share COVID-19 information. Twitter has been a popular source of data for researchers, especially for public health studies. However, the use of Twitter data for research also has drawbacks and barriers. Biases appear everywhere from data collection methods to modeling approaches, and those biases have not been systematically assessed. In this study, we examined six different data collection methods and three different machine learning (ML) models-commonly used in social media analysis-to assess data collection bias and measure ML models' sensitivity to data collection bias. We showed that (1) publicly available Twitter data collection endpoints with appropriate strategies can collect data that is reasonably representative of the Twitter universe; and (2) careful examinations of ML models' sensitivity to data collection bias are critical.

摘要

在冠状病毒病大流行(COVID-19)期间,Twitter 等社交媒体平台已成为个人、医疗专业人员和政府机构分享 COVID-19 信息的场所。Twitter 一直是研究人员,尤其是公共卫生研究人员的热门数据来源。然而,使用 Twitter 数据进行研究也有缺点和障碍。从数据收集方法到建模方法,都存在偏见,而且这些偏见尚未得到系统评估。在这项研究中,我们检查了六种不同的数据收集方法和三种不同的机器学习(ML)模型——社交媒体分析中常用的模型,以评估数据收集偏差并衡量 ML 模型对数据收集偏差的敏感性。我们表明:(1)使用适当策略的公开可用的 Twitter 数据收集终结点可以收集到相对合理地代表 Twitter 宇宙的数据集;(2)仔细检查 ML 模型对数据收集偏差的敏感性至关重要。

相似文献

引用本文的文献

本文引用的文献

5
User's guide to correlation coefficients.相关系数用户指南。
Turk J Emerg Med. 2018 Aug 7;18(3):91-93. doi: 10.1016/j.tjem.2018.08.001. eCollection 2018 Sep.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验