Hegarty Sarah E, Linn Kristin A, Zhang Hong, Teeple Stephanie, Albert Paul S, Parikh Ravi B, Courtright Katherine, Kent David M, Chen Jinbo
Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA.
Department of Statistics and Finance, University of Science and Technology of China, Hefei, Anhui, P.R. China.
medRxiv. 2025 Feb 2:2025.01.31.25321489. doi: 10.1101/2025.01.31.25321489.
The proliferation of algorithm-assisted decision making has prompted calls for careful assessment of algorithm fairness. One popular fairness metric, equal opportunity, demands parity in true positive rates (TPRs) across different population subgroups. However, we highlight a critical but overlooked weakness in this measure: at a given decision threshold, TPRs vary when the underlying risk distribution varies across subgroups, even if the model equally captures the underlying risks. Failure to account for variations in risk distributions may lead to misleading conclusions on performance disparity. To address this issue, we introduce a novel metric called adjusted TPR (aTPR), which modifies subgroup-specific TPRs to reflect performance relative to the risk distribution in a common reference subgroup. Evaluating fairness using aTPRs promotes equal treatment for equal risk by reflecting whether individuals with similar underlying risks have similar opportunities of being identified as high risk by the model, regardless of subgroup membership. We demonstrate our method through numerical experiments that explore a range of differential calibration relationships and in a real-world data set that predicts 6-month mortality risk in an in-patient sample in order to increase timely referrals for palliative care consultations.
算法辅助决策的激增引发了对算法公平性进行仔细评估的呼声。一种流行的公平性指标——平等机会,要求不同人群亚组的真阳性率(TPR)保持一致。然而,我们强调了这一指标中一个关键但被忽视的弱点:在给定的决策阈值下,即使模型同样能够捕捉潜在风险,但当各亚组的潜在风险分布不同时,真阳性率也会有所不同。未能考虑风险分布的差异可能会导致关于性能差异的误导性结论。为了解决这个问题,我们引入了一种名为调整真阳性率(aTPR)的新指标,该指标对特定亚组的真阳性率进行修正,以反映相对于共同参考亚组中风险分布的性能。使用调整真阳性率评估公平性,通过反映具有相似潜在风险的个体被模型识别为高风险的机会是否相似,而不论其所属亚组如何,从而促进对同等风险的平等对待。我们通过数值实验展示了我们的方法,这些实验探索了一系列差异校准关系,并在一个预测住院样本6个月死亡风险的真实数据集上进行了实验,以增加及时转介进行姑息治疗咨询的机会。