Suppr超能文献

Addressing bias in bagging and boosting regression models.

作者信息

Ugirumurera Juliette, Bensen Erik A, Severino Joseph, Sanyal Jibonananda

机构信息

Computational Science Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, CO, 80401, USA.

Department of Statistics and Data Science, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA, 15213, USA.

出版信息

Sci Rep. 2024 Aug 8;14(1):18452. doi: 10.1038/s41598-024-68907-5.

Abstract

As artificial intelligence (AI) becomes widespread, there is increasing attention on investigating bias in machine learning (ML) models. Previous research concentrated on classification problems, with little emphasis on regression models. This paper presents an easy-to-apply and effective methodology for mitigating bias in bagging and boosting regression models, that is also applicable to any model trained through minimizing a differentiable loss function. Our methodology measures bias rigorously and extends the ML model's loss function with a regularization term to penalize high correlations between model errors and protected attributes. We applied our approach to three popular tree-based ensemble models: a random forest model (RF), a gradient-boosted model (GBT), and an extreme gradient boosting model (XGBoost). We implemented our methodology on a case study for predicting road-level traffic volume, where RF, GBT, and XGBoost models were shown to have high accuracy. Despite high accuracy, the ML models were shown to perform poorly on roads in minority-populated areas. Our bias mitigation approach reduced minority-related bias by over 50%.

摘要
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6bf3/11310502/2913e2fe6b00/41598_2024_68907_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验