Kamikubo Rie, Wang Lining, Marte Crystal, Mahmood Amnah, Kacorri Hernisa
College of Information Studies, University of Maryland, College Park, United States.
Department of Computer Science, University of Maryland, College Park, United States.
ASSETS. 2022 Oct;2022. doi: 10.1145/3517428.3544826. Epub 2022 Oct 22.
As data-driven systems are increasingly deployed at scale, ethical concerns have arisen around unfair and discriminatory outcomes for historically marginalized groups that are underrepresented in training data. In response, work around AI fairness and inclusion has called for datasets that are representative of various demographic groups. In this paper, we contribute an analysis of the representativeness of age, gender, and race & ethnicity in accessibility datasets-datasets sourced from people with disabilities and older adults-that can potentially play an important role in mitigating bias for inclusive AI-infused applications. We examine the current state of representation within datasets sourced by people with disabilities by reviewing publicly-available information of 190 datasets, we call these accessibility datasets. We find that accessibility datasets represent diverse ages, but have gender and race representation gaps. Additionally, we investigate how the sensitive and complex nature of demographic variables makes classification difficult and inconsistent (, gender, race & ethnicity), with the source of labeling often unknown. By reflecting on the current challenges and opportunities for representation of disabled data contributors, we hope our effort expands the space of possibility for greater inclusion of marginalized communities in AI-infused systems.
随着数据驱动系统越来越大规模地部署,围绕训练数据中代表性不足的历史边缘化群体出现不公平和歧视性结果的伦理问题也随之产生。作为回应,有关人工智能公平性和包容性的工作要求数据集能够代表不同的人口群体。在本文中,我们对无障碍数据集(即来自残疾人和老年人的数据集)中年龄、性别、种族和民族的代表性进行了分析,这些数据集可能在减轻人工智能应用偏见以实现包容性方面发挥重要作用。我们通过审查190个数据集的公开信息来研究残疾人士提供的数据集的当前代表性状况,我们将这些数据集称为无障碍数据集。我们发现无障碍数据集涵盖了不同的年龄,但在性别和种族代表性方面存在差距。此外,我们调查了人口变量的敏感和复杂性质如何导致分类困难且不一致(包括性别、种族和民族),而且标签来源往往不明。通过思考当前残疾数据提供者代表性方面的挑战和机遇,我们希望我们的努力能够拓展可能性空间,以便在注入人工智能的系统中更好地纳入边缘化群体。