Department of Sociology, University of Chicago, Chicago, IL, USA.
Department of Sociology, Santa Clara University, Santa Clara, CA, USA.
Nat Hum Behav. 2023 Jul;7(7):1084-1095. doi: 10.1038/s41562-023-01587-9. Epub 2023 Apr 17.
Academics and companies increasingly draw on large datasets to understand the social world, and name-based demographic ascription tools are widespread for imputing information that is often missing from these large datasets. These approaches have drawn criticism on ethical, empirical and theoretical grounds. Using a survey of all authors listed on articles in sociology, economics and communication journals in Web of Science between 2015 and 2020, we compared self-identified demographics with name-based imputations of gender and race/ethnicity for 19,924 scholars across four gender ascription tools and four race/ethnicity ascription tools. We found substantial inequalities in how these tools misgender and misrecognize the race/ethnicity of authors, distributing erroneous ascriptions unevenly among other demographic traits. Because of the empirical and ethical consequences of these errors, scholars need to be cautious with the use of demographic imputation. We recommend five principles for the responsible use of name-based demographic inference.
学者和公司越来越多地利用大型数据集来了解社会世界,基于姓名的人口统计学归因工具也被广泛用于推断这些大型数据集中经常缺失的信息。这些方法在伦理、经验和理论方面都受到了批评。我们利用 2015 年至 2020 年期间在 Web of Science 上收录的社会学、经济学和传播学期刊上所有作者的调查,比较了 19924 名学者在四个性别归因工具和四个种族/族裔归因工具中自我认定的人口统计学数据与基于姓名的性别和种族/族裔推断。我们发现,这些工具在错误性别和错误识别作者种族/族裔方面存在很大差异,将错误的归因在其他人口统计学特征之间分配不均。由于这些错误的经验和伦理后果,学者们在使用人口统计学推断时需要谨慎。我们建议了五条负责任地使用基于姓名的人口统计学推断的原则。