无障碍数据集的数据代表性：一项荟萃分析。

Data Representativeness in Accessibility Datasets: A Meta-Analysis.

作者信息

Kamikubo Rie, Wang Lining, Marte Crystal, Mahmood Amnah, Kacorri Hernisa

机构信息

College of Information Studies, University of Maryland, College Park, United States.

Department of Computer Science, University of Maryland, College Park, United States.

出版信息

ASSETS. 2022 Oct;2022. doi: 10.1145/3517428.3544826. Epub 2022 Oct 22.

DOI:10.1145/3517428.3544826

PMID:36939417

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10024595/

Abstract

As data-driven systems are increasingly deployed at scale, ethical concerns have arisen around unfair and discriminatory outcomes for historically marginalized groups that are underrepresented in training data. In response, work around AI fairness and inclusion has called for datasets that are representative of various demographic groups. In this paper, we contribute an analysis of the representativeness of age, gender, and race & ethnicity in accessibility datasets-datasets sourced from people with disabilities and older adults-that can potentially play an important role in mitigating bias for inclusive AI-infused applications. We examine the current state of representation within datasets sourced by people with disabilities by reviewing publicly-available information of 190 datasets, we call these accessibility datasets. We find that accessibility datasets represent diverse ages, but have gender and race representation gaps. Additionally, we investigate how the sensitive and complex nature of demographic variables makes classification difficult and inconsistent (, gender, race & ethnicity), with the source of labeling often unknown. By reflecting on the current challenges and opportunities for representation of disabled data contributors, we hope our effort expands the space of possibility for greater inclusion of marginalized communities in AI-infused systems.

摘要

随着数据驱动系统越来越大规模地部署，围绕训练数据中代表性不足的历史边缘化群体出现不公平和歧视性结果的伦理问题也随之产生。作为回应，有关人工智能公平性和包容性的工作要求数据集能够代表不同的人口群体。在本文中，我们对无障碍数据集（即来自残疾人和老年人的数据集）中年龄、性别、种族和民族的代表性进行了分析，这些数据集可能在减轻人工智能应用偏见以实现包容性方面发挥重要作用。我们通过审查190个数据集的公开信息来研究残疾人士提供的数据集的当前代表性状况，我们将这些数据集称为无障碍数据集。我们发现无障碍数据集涵盖了不同的年龄，但在性别和种族代表性方面存在差距。此外，我们调查了人口变量的敏感和复杂性质如何导致分类困难且不一致（包括性别、种族和民族），而且标签来源往往不明。通过思考当前残疾数据提供者代表性方面的挑战和机遇，我们希望我们的努力能够拓展可能性空间，以便在注入人工智能的系统中更好地纳入边缘化群体。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/750d/10024595/5f7a2d9ab6fd/nihms-1869788-f0001.jpg

相似文献

Data Representativeness in Accessibility Datasets: A Meta-Analysis.无障碍数据集的数据代表性：一项荟萃分析。

ASSETS. 2022 Oct;2022. doi: 10.1145/3517428.3544826. Epub 2022 Oct 22.

Sharing Practices for Datasets Related to Accessibility and Aging.与无障碍和老龄化相关的数据集共享实践。

ASSETS. 2021;1. doi: 10.1145/3441852.3471208.

Contributing to Accessibility Datasets: Reflections on Sharing Study Data by Blind People.为无障碍数据集做出贡献：视障人士分享研究数据的思考

Proc SIGCHI Conf Hum Factor Comput Syst. 2023 Apr;2023:827. doi: 10.1145/3544548.3581337. Epub 2023 Apr 19.

AI and inclusion in simulation education and leadership: a global cross-sectional evaluation of diversity.人工智能与模拟教育及领导力中的包容性：一项全球多样性横断面评估

Adv Simul (Lond). 2025 May 4;10(1):26. doi: 10.1186/s41077-025-00355-1.

Enhancing happiness and well-being: AI-driven solutions for accessible, inclusive travel experiences for people with disabilities.提升幸福与福祉：人工智能驱动的解决方案助力残障人士获得无障碍、包容性的旅行体验。

Disabil Rehabil Assist Technol. 2025 Apr 12:1-12. doi: 10.1080/17483107.2025.2488391.

IncluSet: A Data Surfacing Repository for Accessibility Datasets.IncluSet：一个用于无障碍数据集的数据呈现存储库。

ASSETS. 2020;72. doi: 10.1145/3373625.3418026.

Data stewardship and curation practices in AI-based genomics and automated microscopy image analysis for high-throughput screening studies: promoting robust and ethical AI applications.基于人工智能的基因组学和用于高通量筛选研究的自动显微镜图像分析中的数据管理与整理实践：推动可靠且符合伦理的人工智能应用。

Hum Genomics. 2025 Feb 23;19(1):16. doi: 10.1186/s40246-025-00716-x.

Exploring technology use among older adults with intellectual disabilities: barriers, opportunities, and the role of advanced technologies.探索智障老年人的技术使用情况：障碍、机遇及先进技术的作用。

Disabil Rehabil Assist Technol. 2025 Apr 28:1-12. doi: 10.1080/17483107.2025.2498566.

Toward representative genomic research: the children's rare disease cohorts experience.迈向具有代表性的基因组研究：儿童罕见病队列研究经验

Ther Adv Rare Dis. 2023 Aug 22;4:26330040231181406. doi: 10.1177/26330040231181406. eCollection 2023 Jan-Dec.

Availability of information needed to evaluate algorithmic fairness - A systematic review of publicly accessible critical care databases.评估算法公平性所需信息的可用性 - 公开可访问的重症监护数据库的系统评价。

Anaesth Crit Care Pain Med. 2023 Oct;42(5):101248. doi: 10.1016/j.accpm.2023.101248. Epub 2023 May 20.

引用本文的文献

Hevelius Report: Visualizing Web-Based Mobility Test Data For Clinical Decision and Learning Support.赫维留斯报告：可视化基于网络的移动性测试数据以支持临床决策和学习

ASSETS. 2024 Oct;2024. doi: 10.1145/3663548.3688490. Epub 2024 Oct 27.

Levelling up as a fair solution in AI enabled cancer screening.在人工智能辅助癌症筛查中，将公平性提升作为一种合理的解决方案。

Front Digit Health. 2025 Feb 25;7:1540982. doi: 10.3389/fdgth.2025.1540982. eCollection 2025.

Contributing to Accessibility Datasets: Reflections on Sharing Study Data by Blind People.为无障碍数据集做出贡献：视障人士分享研究数据的思考

Proc SIGCHI Conf Hum Factor Comput Syst. 2023 Apr;2023:827. doi: 10.1145/3544548.3581337. Epub 2023 Apr 19.

本文引用的文献

Sharing Practices for Datasets Related to Accessibility and Aging.与无障碍和老龄化相关的数据集共享实践。

ASSETS. 2021;1. doi: 10.1145/3441852.3471208.

IncluSet: A Data Surfacing Repository for Accessibility Datasets.IncluSet：一个用于无障碍数据集的数据呈现存储库。

ASSETS. 2020;72. doi: 10.1145/3373625.3418026.

Updated Guidance on the Reporting of Race and Ethnicity in Medical and Science Journals.医学与科学期刊中种族与民族报告的更新指南。

JAMA. 2021 Aug 17;326(7):621-627. doi: 10.1001/jama.2021.13304.

Geographic Distribution of US Cohorts Used to Train Deep Learning Algorithms.美国用于训练深度学习算法的队列的地理分布。

JAMA. 2020 Sep 22;324(12):1212-1213. doi: 10.1001/jama.2020.12067.

Hidden in Plain Sight - Reconsidering the Use of Race Correction in Clinical Algorithms.隐匿于众目睽睽之下——重新审视临床算法中种族校正的应用

N Engl J Med. 2020 Aug 27;383(9):874-882. doi: 10.1056/NEJMms2004740. Epub 2020 Jun 17.

Hands Holding Clues for Object Recognition in Teachable Machines.手中线索助力可教机器进行物体识别。

Proc SIGCHI Conf Hum Factor Comput Syst. 2019 May;2019. doi: 10.1145/3290605.3300566.

A machine learning algorithm successfully screens for Parkinson's in web users.机器学习算法成功地为网络用户筛查出帕金森病。

Ann Clin Transl Neurol. 2019 Dec;6(12):2503-2509. doi: 10.1002/acn3.50945. Epub 2019 Nov 12.

Population-scale hand tremor analysis via anonymized mouse cursor signals.通过匿名化鼠标光标信号进行大规模人群手部震颤分析。

NPJ Digit Med. 2019 Sep 24;2:93. doi: 10.1038/s41746-019-0171-4. eCollection 2019.

Detecting neurodegenerative disorders from web search signals.从网络搜索信号中检测神经退行性疾病。

NPJ Digit Med. 2018 Apr 23;1:8. doi: 10.1038/s41746-018-0016-6. eCollection 2018.

"Older Adults with ASD: The Consequences of Aging." Insights from a series of special interest group meetings held at the International Society for Autism Research 2016-2017.患有自闭症谱系障碍的老年人：衰老的后果。来自2016 - 2017年在国际自闭症研究协会举行的一系列特别兴趣小组会议的见解。

Res Autism Spectr Disord. 2019 Jul;63:3-12. doi: 10.1016/j.rasd.2018.08.007.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验