School of Project Management, Faculty of Engineering, The University of Sydney, Forest Lodge, Camperdown, NSW, 2037, Australia.
CSIRO Data61, Hobart, TAS, Australia.
Sci Rep. 2024 Aug 1;14(1):17753. doi: 10.1038/s41598-024-68651-w.
Fairness in machine learning (ML) emerges as a critical concern as AI systems increasingly influence diverse aspects of society, from healthcare decisions to legal judgments. Many studies show evidence of unfair ML outcomes. However, the current body of literature lacks a statistically validated approach that can evaluate the fairness of a deployed ML algorithm against a dataset. A novel evaluation approach is introduced in this research based on k-fold cross-validation and statistical t-tests to assess the fairness of ML algorithms. This approach was exercised across five benchmark datasets using six classical ML algorithms. Considering four fair ML definitions guided by the current literature, our analysis showed that the same dataset generates a fair outcome for one ML algorithm but an unfair result for another. Such an observation reveals complex, context-dependent fairness issues in ML, complicated further by the varied operational mechanisms of the underlying ML models. Our proposed approach enables researchers to check whether deploying any ML algorithms against a protected attribute within datasets is fair. We also discuss the broader implications of the proposed approach, highlighting a notable variability in its fairness outcomes. Our discussion underscores the need for adaptable fairness definitions and the exploration of methods to enhance the fairness of ensemble approaches, aiming to advance fair ML practices and ensure equitable AI deployment across societal sectors.
机器学习中的公平性(ML)日益成为一个关键关注点,因为人工智能系统越来越多地影响到社会的各个方面,从医疗保健决策到法律判决。许多研究都表明存在不公平的 ML 结果的证据。然而,目前的文献缺乏一种经过统计学验证的方法,可以根据数据集评估部署的 ML 算法的公平性。本研究引入了一种基于 k 折交叉验证和统计 t 检验的新评估方法,以评估 ML 算法的公平性。该方法在五个基准数据集上使用六种经典 ML 算法进行了应用。考虑到当前文献中指导的四个公平 ML 定义,我们的分析表明,同一数据集对一个 ML 算法产生公平结果,但对另一个算法产生不公平结果。这种观察结果揭示了 ML 中复杂的、依赖上下文的公平性问题,而底层 ML 模型的不同操作机制使问题更加复杂。我们提出的方法使研究人员能够检查在数据集中的受保护属性上部署任何 ML 算法是否公平。我们还讨论了所提出方法的更广泛影响,突出了其公平结果的显著可变性。我们的讨论强调了需要适应性的公平性定义,并探索增强集成方法公平性的方法,旨在推进公平的 ML 实践,并确保在社会各个部门公平地部署 AI。