Liang Yuxuan, Chao Hanqing, Zhang Jiajin, Wang Ge, Yan Pingkun
Biomedical Imaging Center, Center for Biotechnology and Interdisciplinary Studies, Department of Biomedical Engineering, Rensselaer Polytechnic Institute, 110 8th, St, Troy, 12180, New York, United States.
Meta Radiol. 2024 Sep;2(3). doi: 10.1016/j.metrad.2024.100084. Epub 2024 Jun 13.
Fairness of artificial intelligence and machine learning models, often caused by imbalanced datasets, has long been a concern. While many efforts aim to minimize model bias, this study suggests that traditional fairness evaluation methods may be biased, highlighting the need for a proper evaluation scheme with multiple evaluation metrics due to varying results under different criteria. Moreover, the limited data size of minority groups introduces significant data uncertainty, which can undermine the judgement of fairness. This paper introduces an innovative evaluation approach that estimates data uncertainty in minority groups through bootstrapping from majority groups for a more objective statistical assessment. Extensive experiments reveal that traditional evaluation methods might have drawn inaccurate conclusions about model fairness. The proposed method delivers an unbiased fairness assessment by adeptly addressing the inherent complications of model evaluation on imbalanced datasets. The results show that such comprehensive evaluation can provide more confidence when adopting those models.
人工智能和机器学习模型的公平性长期以来一直是一个备受关注的问题,其往往由不平衡的数据集导致。尽管许多努力旨在尽量减少模型偏差,但本研究表明,传统的公平性评估方法可能存在偏差,这凸显了由于在不同标准下结果各异而需要采用具有多个评估指标的适当评估方案。此外,少数群体有限的数据规模带来了显著的数据不确定性,这可能会破坏公平性判断。本文介绍了一种创新的评估方法,该方法通过从多数群体进行自助抽样来估计少数群体的数据不确定性,以进行更客观的统计评估。大量实验表明,传统评估方法可能对模型公平性得出不准确的结论。所提出的方法通过巧妙地解决不平衡数据集上模型评估的固有复杂性,提供了无偏差的公平性评估。结果表明,这种全面评估在采用这些模型时可以提供更大的信心。