面对开发公平风险评分模型的挑战。

Facing the Challenges of Developing Fair Risk Scoring Models.

作者信息

Szepannek Gero, Lübke Karsten

机构信息

Institute of Applied Computer Science, Stralsund University of Applied Sciences, Stralsund, Germany.

Institute for Empirical Research and Statistics, FOM University of Applied Sciences, Dortmund, Germany.

出版信息

Front Artif Intell. 2021 Oct 14;4:681915. doi: 10.3389/frai.2021.681915. eCollection 2021.

DOI:10.3389/frai.2021.681915

PMID:34723172

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8552888/

Abstract

Algorithmic scoring methods are widely used in the finance industry for several decades in order to prevent risk and to automate and optimize decisions. Regulatory requirements as given by the Basel Committee on Banking Supervision (BCBS) or the EU data protection regulations have led to an increasing interest and research activity on understanding black box machine learning models by means of explainable machine learning. Even though this is a step into a right direction, such methods are not able to guarantee for a fair scoring as machine learning models are not necessarily unbiased and may discriminate with respect to certain subpopulations such as a particular race, gender, or sexual orientation-even if the variable itself is not used for modeling. This is also true for white box methods like logistic regression. In this study, a framework is presented that allows analyzing and developing models with regard to fairness. The proposed methodology is based on techniques of causal inference and some of the methods can be linked to methods from explainable machine learning. A definition of counterfactual fairness is given together with an algorithm that results in a fair scoring model. The concepts are illustrated by means of a transparent simulation and a popular real-world example, the German Credit data using traditional scorecard models based on logistic regression and weight of evidence variable pre-transform. In contrast to previous studies in the field for our study, a corrected version of the data is presented and used. With the help of the simulation, the trade-off between fairness and predictive accuracy is analyzed. The results indicate that it is possible to remove unfairness without a strong performance decrease unless the correlation of the discriminative attributes on the other predictor variables in the model is not too strong. In addition, the challenge in explaining the resulting scoring model and the associated fairness implications to users is discussed.

摘要

几十年来，算法评分方法在金融行业中被广泛使用，以防范风险并实现决策的自动化和优化。巴塞尔银行监管委员会（BCBS）给出的监管要求或欧盟数据保护法规，引发了人们对通过可解释机器学习来理解黑箱机器学习模型的兴趣日益浓厚，并推动了相关研究活动。尽管这是朝着正确方向迈出的一步，但此类方法无法保证公平评分，因为机器学习模型不一定是无偏的，可能会对某些亚群体（如特定种族、性别或性取向）产生歧视——即使变量本身未用于建模。逻辑回归等白盒方法也是如此。在本研究中，提出了一个框架，该框架允许对模型的公平性进行分析和开发。所提出的方法基于因果推断技术，其中一些方法可以与可解释机器学习中的方法相联系。给出了反事实公平性的定义以及一个能产生公平评分模型的算法。通过一个透明模拟和一个流行的实际例子（即使用基于逻辑回归和证据权重变量预变换的传统计分卡模型的德国信贷数据）对这些概念进行了说明。与该领域之前的研究不同，本研究给出并使用了数据的校正版本。借助模拟，分析了公平性与预测准确性之间的权衡。结果表明，除非模型中判别属性与其他预测变量之间的相关性不太强，否则有可能在不显著降低性能的情况下消除不公平性。此外，还讨论了向用户解释所得评分模型及其相关公平性影响方面的挑战。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/721e/8552888/f4a786d412e9/frai-04-681915-g001.jpg

相似文献

Facing the Challenges of Developing Fair Risk Scoring Models.面对开发公平风险评分模型的挑战。

Front Artif Intell. 2021 Oct 14;4:681915. doi: 10.3389/frai.2021.681915. eCollection 2021.

Fair classification domain adaptation: A dual adversarial learning approach.公平分类领域适应：一种双对抗学习方法。

Front Big Data. 2023 Jan 4;5:1049565. doi: 10.3389/fdata.2022.1049565. eCollection 2022.

Learning Fair Representations via Distance Correlation Minimization.通过最小化距离相关性学习公平表示。

IEEE Trans Neural Netw Learn Syst. 2024 Feb;35(2):2139-2152. doi: 10.1109/TNNLS.2022.3187165. Epub 2024 Feb 5.

Toward Learning Trustworthily from Data Combining Privacy, Fairness, and Explainability: An Application to Face Recognition.迈向从融合隐私、公平性和可解释性的数据中可靠学习：人脸识别应用

Entropy (Basel). 2021 Aug 14;23(8):1047. doi: 10.3390/e23081047.

An empirical characterization of fair machine learning for clinical risk prediction.用于临床风险预测的公平机器学习的实证特征描述。

J Biomed Inform. 2021 Jan;113:103621. doi: 10.1016/j.jbi.2020.103621. Epub 2020 Nov 18.

A United States Fair Lending Perspective on Machine Learning.美国对机器学习的公平贷款视角。

Front Artif Intell. 2021 Jun 7;4:695301. doi: 10.3389/frai.2021.695301. eCollection 2021.

A Joint Fairness Model with Applications to Risk Predictions for Under-represented Populations.一种联合公平性模型及其在弱势群体风险预测中的应用。

ArXiv. 2021 May 10:arXiv:2105.04648v4.

The Causal Fairness Field Guide: Perspectives From Social and Formal Sciences.《因果公平性指南：来自社会科学和形式科学的视角》

Front Big Data. 2022 Apr 29;5:892837. doi: 10.3389/fdata.2022.892837. eCollection 2022.

A joint fairness model with applications to risk predictions for underrepresented populations.具有代表性不足人群风险预测应用的联合公平模型。

Biometrics. 2023 Jun;79(2):826-840. doi: 10.1111/biom.13632. Epub 2022 Mar 27.

White learning methodology: A case study of cancer-related disease factors analysis in real-time PACS environment.怀特学习方法：实时PACS环境下癌症相关疾病因素分析的案例研究

Comput Methods Programs Biomed. 2020 Dec;197:105724. doi: 10.1016/j.cmpb.2020.105724. Epub 2020 Aug 26.

引用本文的文献

Medical laboratory data-based models: opportunities, obstacles, and solutions.基于医学实验室数据的模型：机遇、障碍与解决方案。

J Transl Med. 2025 Jul 24;23(1):823. doi: 10.1186/s12967-025-06802-x.

Single-cell omics and machine learning integration to develop a polyamine metabolism-based risk score model in breast cancer patients.单细胞组学和机器学习的整合，为乳腺癌患者开发基于多胺代谢的风险评分模型。

J Cancer Res Clin Oncol. 2024 Oct 23;150(10):473. doi: 10.1007/s00432-024-06001-z.

Striving towards excellence in research on biomarkers.致力于在生物标志物研究方面追求卓越。

Indian J Anaesth. 2022 Apr;66(4):243-247. doi: 10.4103/ija.ija_319_22. Epub 2022 Apr 20.

本文引用的文献

Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.停止为高风险决策解释黑箱机器学习模型，转而使用可解释模型。

Nat Mach Intell. 2019 May;1(5):206-215. doi: 10.1038/s42256-019-0048-x. Epub 2019 May 13.

The long road to fairer algorithms.通往更公平算法的漫长道路。

Nature. 2020 Feb;578(7793):34-36. doi: 10.1038/d41586-020-00274-3.

Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments.公平预测与差异影响：累犯预测工具中的偏见研究。

Big Data. 2017 Jun;5(2):153-163. doi: 10.1089/big.2016.0047.

Computer-based personality judgments are more accurate than those made by humans.基于计算机的人格判断比人类做出的判断更准确。

Proc Natl Acad Sci U S A. 2015 Jan 27;112(4):1036-40. doi: 10.1073/pnas.1418680112. Epub 2015 Jan 12.

pROC: an open-source package for R and S+ to analyze and compare ROC curves.pROC：一个用于 R 和 S+的开源软件包，用于分析和比较 ROC 曲线。

BMC Bioinformatics. 2011 Mar 17;12:77. doi: 10.1186/1471-2105-12-77.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

面对开发公平风险评分模型的挑战。

Facing the Challenges of Developing Fair Risk Scoring Models.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献