Lucas Mary M, Schootman Mario, Laryea Jonathan A, Orcutt Sonia T, Li Chenghui, Ying Jun, Rumpel Jennifer A, Yang Christopher C
College of Computing and Informatics, Drexel University, Philadelphia, PA.
Division of Community Health and Research, Department of Internal Medicine, College of Medicine, the University of Arkansas for Medical Sciences, Springdale, AR.
JCO Clin Cancer Inform. 2024 Nov;8. doi: 10.1200/CCI.23.00194. Epub 2024 Oct 9.
Machine learning algorithms are used for predictive modeling in medicine, but studies often do not evaluate or report on the potential biases of the models. Our purpose was to develop clinical prediction models for readmission after surgery in colorectal cancer (CRC) patients and to examine their potential for racial bias.
We used the 2012-2020 American College of Surgeons' National Surgical Quality Improvement Program (ACS-NSQIP) Participant Use File and Targeted Colectomy File. Patients were categorized into four race groups - White, Black or African American, Other, and Unknown/Not Reported. Potential predictive features were identified from studies of risk factors of 30-day readmission in CRC patients. We compared four machine learning-based methods - logistic regression (LR), multilayer perceptron (MLP), random forest (RF), and XGBoost (XGB). Model bias was assessed using false negative rate (FNR) difference, false positive rate (FPR) difference, and disparate impact.
In all, 112,077 patients were included, 67.2% of whom were White, 9.2% Black, 5.6% Other race, and 18% with race not recorded. There were significant differences in the AUROC, FPR and FNR between race groups across all models. Notably, patients in the 'Other' race category had higher FNR compared to Black patients in all but the XGB model, while Black patients had higher FPR than White patients in some models. Patients in the 'Other' category consistently had the lowest FPR. Applying the 80% rule for disparate impact, the models consistently met the threshold for unfairness for the 'Other' race category.
Predictive models for 30-day readmission after colorectal surgery may perform unequally for different race groups, potentially propagating to inequalities in delivery of care and patient outcomes if the predictions from these models are used to direct care.
机器学习算法用于医学预测建模,但研究往往未对模型的潜在偏差进行评估或报告。我们的目的是开发结直肠癌(CRC)患者术后再入院的临床预测模型,并检验其存在种族偏差的可能性。
我们使用了2012 - 2020年美国外科医师学会国家外科质量改进计划(ACS - NSQIP)参与者使用文件和靶向结肠切除术文件。患者被分为四个种族组——白人、黑人或非裔美国人、其他种族以及未知/未报告。从CRC患者30天再入院危险因素的研究中确定潜在的预测特征。我们比较了四种基于机器学习的方法——逻辑回归(LR)、多层感知器(MLP)、随机森林(RF)和XGBoost(XGB)。使用假阴性率(FNR)差异、假阳性率(FPR)差异和差异影响来评估模型偏差。
总共纳入了112,077名患者,其中67.2%为白人,9.2%为黑人,5.6%为其他种族,18%的患者种族未记录。所有模型中不同种族组之间的曲线下面积(AUROC)、FPR和FNR存在显著差异。值得注意的是,除XGB模型外,在所有模型中,“其他”种族类别的患者FNR高于黑人患者,而在某些模型中,黑人患者的FPR高于白人患者。“其他”类别患者的FPR始终最低。应用差异影响的80%规则,这些模型始终达到“其他”种族类别不公平的阈值。
结直肠手术后30天再入院的预测模型在不同种族组中的表现可能不同,如果将这些模型的预测用于指导护理,可能会导致护理提供和患者结局方面的不平等。