Suppr超能文献

无障碍数据集的数据代表性:一项荟萃分析。

Data Representativeness in Accessibility Datasets: A Meta-Analysis.

作者信息

Kamikubo Rie, Wang Lining, Marte Crystal, Mahmood Amnah, Kacorri Hernisa

机构信息

College of Information Studies, University of Maryland, College Park, United States.

Department of Computer Science, University of Maryland, College Park, United States.

出版信息

ASSETS. 2022 Oct;2022. doi: 10.1145/3517428.3544826. Epub 2022 Oct 22.

Abstract

As data-driven systems are increasingly deployed at scale, ethical concerns have arisen around unfair and discriminatory outcomes for historically marginalized groups that are underrepresented in training data. In response, work around AI fairness and inclusion has called for datasets that are representative of various demographic groups. In this paper, we contribute an analysis of the representativeness of age, gender, and race & ethnicity in accessibility datasets-datasets sourced from people with disabilities and older adults-that can potentially play an important role in mitigating bias for inclusive AI-infused applications. We examine the current state of representation within datasets sourced by people with disabilities by reviewing publicly-available information of 190 datasets, we call these accessibility datasets. We find that accessibility datasets represent diverse ages, but have gender and race representation gaps. Additionally, we investigate how the sensitive and complex nature of demographic variables makes classification difficult and inconsistent (, gender, race & ethnicity), with the source of labeling often unknown. By reflecting on the current challenges and opportunities for representation of disabled data contributors, we hope our effort expands the space of possibility for greater inclusion of marginalized communities in AI-infused systems.

摘要

随着数据驱动系统越来越大规模地部署,围绕训练数据中代表性不足的历史边缘化群体出现不公平和歧视性结果的伦理问题也随之产生。作为回应,有关人工智能公平性和包容性的工作要求数据集能够代表不同的人口群体。在本文中,我们对无障碍数据集(即来自残疾人和老年人的数据集)中年龄、性别、种族和民族的代表性进行了分析,这些数据集可能在减轻人工智能应用偏见以实现包容性方面发挥重要作用。我们通过审查190个数据集的公开信息来研究残疾人士提供的数据集的当前代表性状况,我们将这些数据集称为无障碍数据集。我们发现无障碍数据集涵盖了不同的年龄,但在性别和种族代表性方面存在差距。此外,我们调查了人口变量的敏感和复杂性质如何导致分类困难且不一致(包括性别、种族和民族),而且标签来源往往不明。通过思考当前残疾数据提供者代表性方面的挑战和机遇,我们希望我们的努力能够拓展可能性空间,以便在注入人工智能的系统中更好地纳入边缘化群体。

相似文献

9
Toward representative genomic research: the children's rare disease cohorts experience.迈向具有代表性的基因组研究:儿童罕见病队列研究经验
Ther Adv Rare Dis. 2023 Aug 22;4:26330040231181406. doi: 10.1177/26330040231181406. eCollection 2023 Jan-Dec.

本文引用的文献

6
Hands Holding Clues for Object Recognition in Teachable Machines.手中线索助力可教机器进行物体识别。
Proc SIGCHI Conf Hum Factor Comput Syst. 2019 May;2019. doi: 10.1145/3290605.3300566.
9
Detecting neurodegenerative disorders from web search signals.从网络搜索信号中检测神经退行性疾病。
NPJ Digit Med. 2018 Apr 23;1:8. doi: 10.1038/s41746-018-0016-6. eCollection 2018.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验