Suppr超能文献

评估 30 天内医院再入院模型中的算法偏差:回顾性分析。

Evaluating Algorithmic Bias in 30-Day Hospital Readmission Models: Retrospective Analysis.

机构信息

Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, United States.

Johns Hopkins Center for Population Health Information Technology, Baltimore, MD, United States.

出版信息

J Med Internet Res. 2024 Apr 18;26:e47125. doi: 10.2196/47125.

Abstract

BACKGROUND

The adoption of predictive algorithms in health care comes with the potential for algorithmic bias, which could exacerbate existing disparities. Fairness metrics have been proposed to measure algorithmic bias, but their application to real-world tasks is limited.

OBJECTIVE

This study aims to evaluate the algorithmic bias associated with the application of common 30-day hospital readmission models and assess the usefulness and interpretability of selected fairness metrics.

METHODS

We used 10.6 million adult inpatient discharges from Maryland and Florida from 2016 to 2019 in this retrospective study. Models predicting 30-day hospital readmissions were evaluated: LACE Index, modified HOSPITAL score, and modified Centers for Medicare & Medicaid Services (CMS) readmission measure, which were applied as-is (using existing coefficients) and retrained (recalibrated with 50% of the data). Predictive performances and bias measures were evaluated for all, between Black and White populations, and between low- and other-income groups. Bias measures included the parity of false negative rate (FNR), false positive rate (FPR), 0-1 loss, and generalized entropy index. Racial bias represented by FNR and FPR differences was stratified to explore shifts in algorithmic bias in different populations.

RESULTS

The retrained CMS model demonstrated the best predictive performance (area under the curve: 0.74 in Maryland and 0.68-0.70 in Florida), and the modified HOSPITAL score demonstrated the best calibration (Brier score: 0.16-0.19 in Maryland and 0.19-0.21 in Florida). Calibration was better in White (compared to Black) populations and other-income (compared to low-income) groups, and the area under the curve was higher or similar in the Black (compared to White) populations. The retrained CMS and modified HOSPITAL score had the lowest racial and income bias in Maryland. In Florida, both of these models overall had the lowest income bias and the modified HOSPITAL score showed the lowest racial bias. In both states, the White and higher-income populations showed a higher FNR, while the Black and low-income populations resulted in a higher FPR and a higher 0-1 loss. When stratified by hospital and population composition, these models demonstrated heterogeneous algorithmic bias in different contexts and populations.

CONCLUSIONS

Caution must be taken when interpreting fairness measures' face value. A higher FNR or FPR could potentially reflect missed opportunities or wasted resources, but these measures could also reflect health care use patterns and gaps in care. Simply relying on the statistical notions of bias could obscure or underplay the causes of health disparity. The imperfect health data, analytic frameworks, and the underlying health systems must be carefully considered. Fairness measures can serve as a useful routine assessment to detect disparate model performances but are insufficient to inform mechanisms or policy changes. However, such an assessment is an important first step toward data-driven improvement to address existing health disparities.

摘要

背景

医疗保健中预测算法的采用存在算法偏差的可能性,这可能会加剧现有的差距。已经提出了公平性指标来衡量算法偏差,但它们在实际任务中的应用受到限制。

目的

本研究旨在评估常见 30 天医院再入院模型应用中的算法偏差,并评估选定公平性指标的有用性和可解释性。

方法

我们在这项回顾性研究中使用了来自马里兰州和佛罗里达州 2016 年至 2019 年的 1060 万成年住院患者出院数据。评估了预测 30 天医院再入院的模型:LACE 指数、改良 HOSPITAL 评分和改良医疗保险和医疗补助服务中心(CMS)再入院衡量标准,这些模型是按原样应用的(使用现有系数)和重新训练的(用 50%的数据重新校准)。评估了所有模型、黑人和白人之间以及低收入和其他收入群体之间的预测性能和偏差指标。偏差指标包括假阴性率(FNR)、假阳性率(FPR)、0-1 损失和广义熵指数的均等性。根据 FNR 和 FPR 差异分层种族偏差,以探索不同人群中算法偏差的变化。

结果

经重新训练的 CMS 模型表现出最佳的预测性能(马里兰州的曲线下面积为 0.74,佛罗里达州为 0.68-0.70),而改良 HOSPITAL 评分表现出最佳的校准(马里兰州的 Brier 分数为 0.16-0.19,佛罗里达州为 0.19-0.21)。白人(与黑人相比)和其他收入(与低收入相比)群体的校准效果更好,而黑人(与白人相比)群体的曲线下面积更高或相似。经重新训练的 CMS 和改良 HOSPITAL 评分在马里兰州的种族和收入偏差最小。在佛罗里达州,这两个模型总体上的收入偏差最小,改良 HOSPITAL 评分的种族偏差最小。在这两个州,白人高收入人群的 FNR 较高,而黑人低收入人群的 FPR 和 0-1 损失较高。按医院和人口构成分层时,这些模型在不同背景和人群中表现出了异质的算法偏差。

结论

在解释公平性指标的表面价值时必须谨慎。较高的 FNR 或 FPR 可能反映了错失的机会或浪费的资源,但这些指标也可能反映了医疗保健的使用模式和护理差距。仅仅依靠统计上的偏差概念可能会掩盖或淡化健康差异的原因。必须仔细考虑不完美的健康数据、分析框架和潜在的健康系统。公平性指标可以作为一种有用的常规评估手段来检测模型性能的差异,但不足以为机制或政策变化提供信息。然而,这种评估是朝着以数据为导向的改进方向迈出的重要一步,以解决现有的健康差距问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a5b6/11066744/bfbaa084aca2/jmir_v26i1e47125_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验