• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用机器学习追踪社交媒体上与流感相关内容时存在种族偏见的风险。

The risk of racial bias while tracking influenza-related content on social media using machine learning.

机构信息

Department of Information Systems and Cyber Security, University of Texas at San Antonio, San Antonio, Texas, USA.

出版信息

J Am Med Inform Assoc. 2021 Mar 18;28(4):839-849. doi: 10.1093/jamia/ocaa326.

DOI:10.1093/jamia/ocaa326
PMID:33484133
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7973478/
Abstract

OBJECTIVE

Machine learning is used to understand and track influenza-related content on social media. Because these systems are used at scale, they have the potential to adversely impact the people they are built to help. In this study, we explore the biases of different machine learning methods for the specific task of detecting influenza-related content. We compare the performance of each model on tweets written in Standard American English (SAE) vs African American English (AAE).

MATERIALS AND METHODS

Two influenza-related datasets are used to train 3 text classification models (support vector machine, convolutional neural network, bidirectional long short-term memory) with different feature sets. The datasets match real-world scenarios in which there is a large imbalance between SAE and AAE examples. The number of AAE examples for each class ranges from 2% to 5% in both datasets. We also evaluate each model's performance using a balanced dataset via undersampling.

RESULTS

We find that all of the tested machine learning methods are biased on both datasets. The difference in false positive rates between SAE and AAE examples ranges from 0.01 to 0.35. The difference in the false negative rates ranges from 0.01 to 0.23. We also find that the neural network methods generally has more unfair results than the linear support vector machine on the chosen datasets.

CONCLUSIONS

The models that result in the most unfair predictions may vary from dataset to dataset. Practitioners should be aware of the potential harms related to applying machine learning to health-related social media data. At a minimum, we recommend evaluating fairness along with traditional evaluation metrics.

摘要

目的

机器学习被用于理解和跟踪社交媒体上与流感相关的内容。由于这些系统被大规模使用,它们有可能对其旨在帮助的人群产生不利影响。在这项研究中,我们探讨了不同机器学习方法在检测与流感相关内容的特定任务中的偏见。我们比较了每个模型在标准美式英语(SAE)与非裔美国英语(AAE)撰写的推文中的性能。

材料和方法

使用两个与流感相关的数据集来训练 3 个具有不同特征集的文本分类模型(支持向量机、卷积神经网络、双向长短时记忆)。这些数据集与现实世界中的情况相匹配,即 SAE 和 AAE 示例之间存在很大的不平衡。在两个数据集中,每个类的 AAE 示例数量从 2%到 5%不等。我们还通过欠采样评估了每个模型在平衡数据集中的性能。

结果

我们发现,所有测试的机器学习方法在两个数据集上都存在偏差。SAE 和 AAE 示例之间的假阳性率差异范围为 0.01 到 0.35。假阴性率的差异范围为 0.01 到 0.23。我们还发现,在所选择的数据集上,神经网络方法通常比线性支持向量机产生更不公平的结果。

结论

导致最不公平预测的模型可能因数据集而异。从业者应该意识到将机器学习应用于与健康相关的社交媒体数据相关的潜在危害。至少,我们建议在传统评估指标之外,还评估公平性。

相似文献

1
The risk of racial bias while tracking influenza-related content on social media using machine learning.使用机器学习追踪社交媒体上与流感相关内容时存在种族偏见的风险。
J Am Med Inform Assoc. 2021 Mar 18;28(4):839-849. doi: 10.1093/jamia/ocaa326.
2
Hate speech detection and racial bias mitigation in social media based on BERT model.基于 BERT 模型的社交媒体中的仇恨言论检测和种族偏见缓解。
PLoS One. 2020 Aug 27;15(8):e0237861. doi: 10.1371/journal.pone.0237861. eCollection 2020.
3
Detecting Potentially Harmful and Protective Suicide-Related Content on Twitter: Machine Learning Approach.在 Twitter 上检测潜在有害和保护自杀相关内容:机器学习方法。
J Med Internet Res. 2022 Aug 17;24(8):e34705. doi: 10.2196/34705.
4
Forecasting influenza-like illness dynamics for military populations using neural networks and social media.利用神经网络和社交媒体预测军队人群中流感样疾病的动态。
PLoS One. 2017 Dec 15;12(12):e0188941. doi: 10.1371/journal.pone.0188941. eCollection 2017.
5
Detecting and Analyzing Suicidal Ideation on Social Media Using Deep Learning and Machine Learning Models.利用深度学习和机器学习模型检测和分析社交媒体上的自杀意念。
Int J Environ Res Public Health. 2022 Oct 3;19(19):12635. doi: 10.3390/ijerph191912635.
6
DEGnext: classification of differentially expressed genes from RNA-seq data using a convolutional neural network with transfer learning.DEGnext:使用具有迁移学习的卷积神经网络对 RNA-seq 数据进行差异表达基因分类。
BMC Bioinformatics. 2022 Jan 6;23(1):17. doi: 10.1186/s12859-021-04527-4.
7
Effect of incremental feature enrichment on healthcare text classification system: A machine learning paradigm.增量特征增强对医疗保健文本分类系统的影响:一种机器学习范例。
Comput Methods Programs Biomed. 2019 Apr;172:35-51. doi: 10.1016/j.cmpb.2019.01.011. Epub 2019 Feb 1.
8
Assessing fairness in machine learning models: A study of racial bias using matched counterparts in mortality prediction for patients with chronic diseases.评估机器学习模型的公平性:使用慢性病患者死亡率预测中的匹配对照研究种族偏见。
J Biomed Inform. 2024 Aug;156:104677. doi: 10.1016/j.jbi.2024.104677. Epub 2024 Jun 13.
9
Identifying health related occupations of Twitter users through word embedding and deep neural networks.通过词嵌入和深度神经网络识别 Twitter 用户的健康相关职业。
BMC Bioinformatics. 2022 Sep 28;22(Suppl 10):630. doi: 10.1186/s12859-022-04933-2.
10
Deep convolutional neural network and IoT technology for healthcare.用于医疗保健的深度卷积神经网络和物联网技术。
Digit Health. 2024 Jan 17;10:20552076231220123. doi: 10.1177/20552076231220123. eCollection 2024 Jan-Dec.

引用本文的文献

1
Misguided Artificial Intelligence: How Racial Bias is Built Into Clinical Models.被误导的人工智能:临床模型中如何嵌入种族偏见。
Brown J Hosp Med. 2022 Sep 5;2(1):38021. doi: 10.56305/001c.38021. eCollection 2023.
2
Natural language model for automatic identification of Intimate Partner Violence reports from Twitter.用于自动识别来自推特的亲密伴侣暴力报告的自然语言模型。
Array (N Y). 2022 Sep;15. doi: 10.1016/j.array.2022.100217. Epub 2022 Jul 20.
3
Statistical quantification of confounding bias in machine learning models.机器学习模型中混杂偏倚的统计量化。
Gigascience. 2022 Aug 26;11. doi: 10.1093/gigascience/giac082.
4
Best Practices on Big Data Analytics to Address Sex-Specific Biases in Our Understanding of the Etiology, Diagnosis, and Prognosis of Diseases.大数据分析在解决我们对疾病病因学、诊断和预后的性别特异性偏见方面的最佳实践。
Annu Rev Biomed Data Sci. 2022 Aug 10;5:251-267. doi: 10.1146/annurev-biodatasci-122120-025806. Epub 2022 May 13.
5
Methods to Establish Race or Ethnicity of Twitter Users: Scoping Review.方法建立种族或族裔的 Twitter 用户:范围审查。
J Med Internet Res. 2022 Apr 29;24(4):e35788. doi: 10.2196/35788.
6
Patients and consumers (and the data they generate): an underutilized resource.患者及消费者(以及他们产生的数据):一种未得到充分利用的资源。
J Am Med Inform Assoc. 2021 Mar 18;28(4):675-676. doi: 10.1093/jamia/ocab040.

本文引用的文献

1
COVID-19 Mobile Positioning Data Contact Tracing and Patient Privacy Regulations: Exploratory Search of Global Response Strategies and the Use of Digital Tools in Nigeria.COVID-19 移动定位数据接触者追踪和患者隐私法规:探索全球应对策略和尼日利亚数字工具的使用。
JMIR Mhealth Uhealth. 2020 Apr 27;8(4):e19139. doi: 10.2196/19139.
2
Coronavirus Goes Viral: Quantifying the COVID-19 Misinformation Epidemic on Twitter.冠状病毒迅速传播:量化推特上关于新冠疫情的错误信息传播情况
Cureus. 2020 Mar 13;12(3):e7255. doi: 10.7759/cureus.7255.
3
Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing.量化 SARS-CoV-2 传播表明数字接触者追踪可控制疫情。
Science. 2020 May 8;368(6491). doi: 10.1126/science.abb6936. Epub 2020 Mar 31.
4
Dissecting racial bias in an algorithm used to manage the health of populations.剖析用于管理人群健康的算法中的种族偏见。
Science. 2019 Oct 25;366(6464):447-453. doi: 10.1126/science.aax2342.
5
Racial/Ethnic Differences in Influenza and Pneumococcal Vaccination Rates Among Older Adults in New York City and Los Angeles and Orange Counties.纽约市、洛杉矶和橙县老年人中流感和肺炎球菌疫苗接种率的种族/民族差异。
Prev Chronic Dis. 2018 Dec 13;15:E159. doi: 10.5888/pcd15.180101.
6
Social media use and influenza vaccine uptake among White and African American adults.社交媒体使用与白人和非裔美国成年人的流感疫苗接种率。
Vaccine. 2018 Nov 26;36(49):7556-7561. doi: 10.1016/j.vaccine.2018.10.049. Epub 2018 Oct 30.
7
Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task.从 Twitter 上获取药物相关文本分类和概念规范化的数据和系统:来自社交媒体挖掘健康(SMM4H)-2017 共享任务的见解。
J Am Med Inform Assoc. 2018 Oct 1;25(10):1274-1283. doi: 10.1093/jamia/ocy114.
8
Extracting chemical-protein relations with ensembles of SVM and deep learning models.基于 SVM 和深度学习模型集成提取化学-蛋白质关系。
Database (Oxford). 2018 Jan 1;2018. doi: 10.1093/database/bay073.
9
Word embeddings quantify 100 years of gender and ethnic stereotypes.词嵌入量化了 100 年来的性别和种族刻板印象。
Proc Natl Acad Sci U S A. 2018 Apr 17;115(16):E3635-E3644. doi: 10.1073/pnas.1720347115. Epub 2018 Apr 3.
10
Predicting cancer outcomes from histology and genomics using convolutional networks.使用卷积网络从组织学和基因组学预测癌症结局。
Proc Natl Acad Sci U S A. 2018 Mar 27;115(13):E2970-E2979. doi: 10.1073/pnas.1717139115. Epub 2018 Mar 12.