Brown Clare C, Thomsen Michael, Amick Benjamin C, Tilford J Mick, Bryant-Moore Keneshia, Gomez-Acevedo Horacio
Department of Health Policy and Management, Fay W Boozman College of Public Health, University of Arkansas for Medical Sciences, 4301 W Markham St Slot #820-12, Little Rock, AR, 72205, USA.
Department of Epidemiology, Fay W Boozman College of Public Health, University of Arkansas for Medical Sciences, Little Rock, AR, USA.
J Racial Ethn Health Disparities. 2025 Jan 29. doi: 10.1007/s40615-025-02296-x.
To evaluate algorithmic fairness in low birthweight predictive models.
This study analyzed insurance claims (n = 9,990,990; 2013-2021) linked with birth certificates (n = 173,035; 2014-2021) from the Arkansas All Payers Claims Database (APCD).
Low birthweight (< 2500 g) predictive models included four approaches (logistic, elastic net, linear discriminate analysis, and gradient boosting machines [GMB]) with and without racial/ethnic information. Model performance was assessed overall, among Hispanic individuals, and among non-Hispanic White, Black, Native Hawaiian/Other Pacific Islander, and Asian individuals using multiple measures of predictive performance (i.e., AUC [area under the receiver operating characteristic curve] scores, calibration, sensitivity, and specificity).
AUC scores were lower (underperformed) for Black and Asian individuals relative to White individuals. In the strongest performing model (i.e., GMB), the AUC scores for Black (0.718 [95% CI: 0.705-0.732]) and Asian (0.655 [95% CI: 0.582-0.728]) populations were lower than the AUC for White individuals (0.764 [95% CI: 0.754-0.775 ]). Model performance measured using AUC was comparable in models that included and excluded race/ethnicity; however, sensitivity (i.e., the percent of records correctly predicted as "low birthweight" among those who actually had low birthweight) was lower and calibration was weaker, suggesting underprediction for Black individuals when race/ethnicity were excluded.
This study found that racially blind models resulted in underprediction and reduced algorithmic performance, measured using sensitivity and calibration, for Black populations. Such under prediction could unfairly decrease resource allocation needed to reduce perinatal health inequities. Population health management programs should carefully consider algorithmic fairness in predictive models and associated resource allocation decisions.
评估低出生体重预测模型中的算法公平性。
本研究分析了阿肯色州全支付方索赔数据库(APCD)中与出生证明(n = 173,035;2014 - 2021年)相关联的保险索赔(n = 9,990,990;2013 - 2021年)。
低出生体重(<2500克)预测模型包括四种方法(逻辑回归、弹性网络、线性判别分析和梯度提升机[GMB]),有无种族/民族信息。使用多种预测性能指标(即AUC[受试者工作特征曲线下面积]分数、校准、敏感性和特异性)对总体、西班牙裔个体以及非西班牙裔白人、黑人、夏威夷原住民/其他太平洋岛民和亚裔个体的模型性能进行评估。
相对于白人个体,黑人和亚裔个体的AUC分数较低(表现不佳)。在表现最强的模型(即GMB)中,黑人(0.718[95%CI:0.705 - 0.732])和亚裔(0.655[95%CI:0.582 - 0.728])人群的AUC分数低于白人个体的AUC(0.764[95%CI:0.754 - 0.775])。使用AUC衡量的模型性能在纳入和排除种族/民族的模型中具有可比性;然而,敏感性(即在实际出生体重低的人群中被正确预测为“低出生体重”的记录百分比)较低且校准较弱,这表明在排除种族/民族信息时对黑人个体预测不足。
本研究发现,不考虑种族的模型导致对黑人人群预测不足,并降低了使用敏感性和校准衡量的算法性能。这种预测不足可能会不公平地减少减少围产期健康不平等所需的资源分配。人群健康管理计划应在预测模型和相关资源分配决策中仔细考虑算法公平性。