1 Department of Data Science, Zocdoc , New York, New York.
2 NYU Center for Data Science , New York, New York.
Big Data. 2017 Jun;5(2):120-134. doi: 10.1089/big.2016.0048.
Recent research has helped to cultivate growing awareness that machine-learning systems fueled by big data can create or exacerbate troubling disparities in society. Much of this research comes from outside of the practicing data science community, leaving its members with little concrete guidance to proactively address these concerns. This article introduces issues of discrimination to the data science community on its own terms. In it, we tour the familiar data-mining process while providing a taxonomy of common practices that have the potential to produce unintended discrimination. We also survey how discrimination is commonly measured, and suggest how familiar development processes can be augmented to mitigate systems' discriminatory potential. We advocate that data scientists should be intentional about modeling and reducing discriminatory outcomes. Without doing so, their efforts will result in perpetuating any systemic discrimination that may exist, but under a misleading veil of data-driven objectivity.
最近的研究帮助人们越来越意识到,由大数据驱动的机器学习系统可能会在社会中造成或加剧令人不安的差异。这些研究大多来自实践数据科学界之外,使得成员们几乎没有具体的指导来主动解决这些问题。本文以其自身的术语向数据科学界介绍了歧视问题。在本文中,我们介绍了熟悉的数据挖掘过程,同时提供了可能产生意外歧视的常见做法的分类法。我们还调查了如何衡量歧视,并提出了如何增强常见的开发过程以减轻系统的歧视潜力。我们主张数据科学家应该有意地对模型和减少歧视性结果进行建模。如果不这样做,他们的努力将导致可能存在的任何系统性歧视的延续,而这种歧视是在一个具有误导性的数据驱动客观性的面纱下进行的。