Islam Rashidul, Keya Kamrun Naher, Pan Shimei, Sarwate Anand D, Foulds James R
Department of Information Systems, University of Maryland, Baltimore County, Baltimore, MD 21250, USA.
Department of Electrical and Computer Engineering, Rutgers, The State University of New Jersey, New Brunswick, NJ 08854, USA.
Entropy (Basel). 2023 Apr 14;25(4):660. doi: 10.3390/e25040660.
We propose definitions of fairness in machine learning and artificial intelligence systems that are informed by the framework of intersectionality, a critical lens from the legal, social science, and humanities literature which analyzes how interlocking systems of power and oppression affect individuals along overlapping dimensions including gender, race, sexual orientation, class, and disability. We show that our criteria behave sensibly for any subset of the set of protected attributes, and we prove economic, privacy, and generalization guarantees. Our theoretical results show that our criteria meaningfully operationalize AI fairness in terms of real-world harms, making the measurements interpretable in a manner analogous to differential privacy. We provide a simple learning algorithm using deterministic gradient methods, which respects our intersectional fairness criteria. The measurement of fairness becomes statistically challenging in the minibatch setting due to data sparsity, which increases rapidly in the number of protected attributes and in the values per protected attribute. To address this, we further develop a practical learning algorithm using stochastic gradient methods which incorporates stochastic estimation of the intersectional fairness criteria on minibatches to scale up to big data. Case studies on census data, the COMPAS criminal recidivism dataset, the HHP hospitalization data, and a loan application dataset from HMDA demonstrate the utility of our methods.
我们提出了机器学习和人工智能系统中公平性的定义,这些定义受到交叉性框架的启发。交叉性是法律、社会科学和人文文献中的一个重要视角,它分析了权力和压迫的相互关联系统如何在包括性别、种族、性取向、阶级和残疾等重叠维度上影响个人。我们表明,对于受保护属性集的任何子集,我们的标准都表现合理,并且我们证明了经济、隐私和泛化保证。我们的理论结果表明,我们的标准在实际危害方面有意义地实现了人工智能公平性,使得测量结果能够以类似于差分隐私的方式进行解释。我们提供了一种使用确定性梯度方法的简单学习算法,该算法符合我们的交叉公平标准。由于数据稀疏性,在小批量设置中公平性的测量在统计上具有挑战性,数据稀疏性在受保护属性的数量以及每个受保护属性的值中迅速增加。为了解决这个问题,我们进一步开发了一种使用随机梯度方法的实用学习算法,该算法在小批量上纳入了交叉公平标准的随机估计,以扩展到大数据。对人口普查数据、COMPAS刑事累犯数据集、HHP住院数据以及来自HMDA的贷款申请数据集的案例研究证明了我们方法的实用性。