Sarkar Rahuldeb, Martin Christopher, Mattie Heather, Gichoya Judy Wawira, Stone David J, Celi Leo Anthony
Departments of Respiratory Medicine and Critical Care, Medway NHS Foundation Trust, Gillingham, Kent, UK.
Faculty of Life Sciences, King's College London, London, UK.
medRxiv. 2021 Jan 20:2021.01.19.21249222. doi: 10.1101/2021.01.19.21249222.
Despite wide utilisation of severity scoring systems for case-mix determination and benchmarking in the intensive care unit, the possibility of scoring bias across ethnicities has not been examined. Recent guidelines on the use of illness severity scores to inform triage decisions for allocation of scarce resources such as mechanical ventilation during the current COVID-19 pandemic warrant examination for possible bias in these models. We investigated the performance of three severity scoring systems (APACHE IVa, OASIS, SOFA) across ethnic groups in two large ICU databases in order to identify possible ethnicity-based bias.
Data from the eICU Collaborative Research Database and the Medical Information Mart for Intensive Care were analysed for score performance in Asians, African Americans, Hispanics and Whites after appropriate exclusions. Discrimination and calibration were determined for all three scoring systems in all four groups.
While measurements of discrimination -area under the receiver operating characteristic curve (AUROC) -were significantly different among the groups, they did not display any discernible systematic patterns of bias. In contrast, measurements of calibration -standardised mortality ratio (SMR) -indicated persistent, and in some cases significant, patterns of difference between Hispanics and African Americans versus Asians and Whites. The differences between African Americans and Whites were consistently statistically significant. While calibrations were imperfect for all groups, the scores consistently demonstrated a pattern of over-predicting mortality for African Americans and Hispanics.
The systematic differences in calibration across ethnic groups suggest that illness severity scores reflect bias in their predictions of mortality.
LAC is funded by the National Institute of Health through NIBIB R01 EB017205. There was no specific funding for this study.
尽管重症监护病房广泛使用严重程度评分系统来确定病例组合和进行基准比较,但尚未研究不同种族间评分偏差的可能性。近期关于在当前新冠疫情期间使用疾病严重程度评分来指导稀缺资源(如机械通气)分配的分诊决策的指南,促使人们对这些模型中可能存在的偏差进行研究。我们在两个大型重症监护病房数据库中调查了三个严重程度评分系统(急性生理与慢性健康状况评分系统IVa [APACHE IVa]、重症监护预后评估系统[OASIS]、序贯器官衰竭评估[SOFA])在不同种族群体中的表现,以确定可能存在的基于种族的偏差。
在进行适当排除后,分析了电子重症监护协作研究数据库和重症监护医学信息集市的数据,以评估亚洲人、非裔美国人、西班牙裔和白人的评分表现。确定了所有四个群体中所有三个评分系统的区分度和校准度。
虽然各群体间的区分度测量值——受试者操作特征曲线下面积(AUROC)——存在显著差异,但未显示出任何明显的系统性偏差模式。相比之下,校准度测量值——标准化死亡率(SMR)——表明西班牙裔和非裔美国人与亚洲人和白人之间存在持续的、在某些情况下显著的差异模式。非裔美国人和白人之间的差异始终具有统计学意义。虽然所有群体的校准都不完善,但评分始终显示出对非裔美国人和西班牙裔死亡率预测过度的模式。
不同种族群体在校准度上的系统性差异表明,疾病严重程度评分在死亡率预测中存在偏差。
LAC由美国国立卫生研究院通过美国国立医学图书馆R01 EB017205资助。本研究没有特定资金。