Guo Dandan, Wang Chaojie, Wang Baoxiang, Zha Hongyuan
IEEE Trans Neural Netw Learn Syst. 2024 Feb;35(2):2139-2152. doi: 10.1109/TNNLS.2022.3187165. Epub 2024 Feb 5.
As machine learning algorithms are increasingly deployed for high-impact automated decision-making, the presence of bias (in datasets or tasks) gradually becomes one of the most critical challenges in machine learning applications. Such challenges range from the bias of race in face recognition to the bias of gender in hiring systems, where race and gender can be denoted as sensitive attributes. In recent years, much progress has been made in ensuring fairness and reducing bias in standard machine learning settings. Among them, learning fair representations with respect to the sensitive attributes has attracted increasing attention due to its flexibility in learning the rich representations based on advances in deep learning. In this article, we propose graph-fair, an algorithmic approach to learning fair representations under the graph Laplacian regularization, which reduces the separation between groups and the clustering within a group by encoding the sensitive attribute information into the graph. We have theoretically proved the underlying connection between graph regularization and distance correlation and show that the latter can be regarded as a standardized version of the former, with an additional advantage of being scale-invariant. Therefore, we naturally adopt the distance correlation as the fairness constraint to decrease the dependence between sensitive attributes and latent representations, called dist-fair. In contrast to existing approaches using measures of dependency and adversarial generators, both graph-fair and dist-fair provide simple fairness constraints, which eliminate the need for parameter tuning (e.g., choosing kernels) and introducing adversarial networks. Experiments conducted on real-world corpora indicate that our proposed fairness constraints applied for representation learning can provide better tradeoffs between fairness and utility results than existing approaches.
随着机器学习算法越来越多地被用于具有重大影响的自动化决策,(数据集中或任务中的)偏差的存在逐渐成为机器学习应用中最关键的挑战之一。此类挑战涵盖从人脸识别中的种族偏差到招聘系统中的性别偏差,其中种族和性别可被视为敏感属性。近年来,在确保标准机器学习环境中的公平性和减少偏差方面已经取得了很大进展。其中,基于深度学习的进展,在学习关于敏感属性的公平表示方面因其在学习丰富表示方面的灵活性而受到越来越多的关注。在本文中,我们提出了graph-fair,这是一种在图拉普拉斯正则化下学习公平表示的算法方法,它通过将敏感属性信息编码到图中来减少组间分离和组内聚类。我们从理论上证明了图正则化与距离相关性之间的潜在联系,并表明后者可被视为前者的标准化版本,还具有尺度不变的额外优势。因此,我们自然地采用距离相关性作为公平性约束,以减少敏感属性与潜在表示之间的依赖关系,称为dist-fair。与使用依赖度量和对抗生成器的现有方法相比,graph-fair和dist-fair都提供了简单的公平性约束,无需进行参数调整(例如选择核函数)和引入对抗网络。在真实世界语料库上进行的实验表明,我们提出的用于表示学习的公平性约束在公平性和效用结果之间能比现有方法提供更好的权衡。