Zhang Linying, Richter Lauren R, Kim Tevin, Hripcsak George
Department of Biomedical Informatics Columbia University, New York, NY, USA.
Institute for Informatics, Data Science, and Biostatistics Washington University in St. Louis, St. Louis, MO, USA.
medRxiv. 2024 Jan 9:2024.01.07.24300943. doi: 10.1101/2024.01.07.24300943.
Data-driven clinical prediction algorithms are used widely by clinicians. Understanding what factors can impact the performance and fairness of data-driven algorithms is an important step towards achieving equitable healthcare. To investigate the impact of modeling choices on the algorithmic performance and fairness, we make use of a case study to build a prediction algorithm for estimating glomerular filtration rate (GFR) based on the patient's electronic health record (EHR). We compare three distinct approaches for estimating GFR: CKD-EPI equations, epidemiological models, and EHR-based models. For epidemiological models and EHR-based models, four machine learning models of varying computational complexity (i.e., linear regression, support vector machine, random forest regression, and neural network) were compared. Performance metrics included root mean squared error (RMSE), median difference, and the proportion of GFR estimates within 30% of the measured GFR value (P30). Differential performance between non-African American and African American group was used to assess algorithmic fairness with respect to race. Our study showed that the variable race had a negligible effect on error, accuracy, and differential performance. Furthermore, including more relevant clinical features (e.g., common comorbidities of chronic kidney disease) and using more complex machine learning models, namely random forest regression, significantly lowered the estimation error of GFR. However, the difference in performance between African American and non-African American patients did not decrease, where the estimation error for African American patients remained consistently higher than non-African American patients, indicating that more objective patient characteristics should be discovered and included to improve algorithm performance.
数据驱动的临床预测算法被临床医生广泛使用。了解哪些因素会影响数据驱动算法的性能和公平性是实现公平医疗保健的重要一步。为了研究建模选择对算法性能和公平性的影响,我们利用一个案例研究,基于患者的电子健康记录(EHR)构建一个用于估计肾小球滤过率(GFR)的预测算法。我们比较了三种不同的GFR估计方法:CKD-EPI方程、流行病学模型和基于EHR的模型。对于流行病学模型和基于EHR的模型,比较了四种计算复杂度不同的机器学习模型(即线性回归、支持向量机、随机森林回归和神经网络)。性能指标包括均方根误差(RMSE)、中位数差异以及GFR估计值在测量的GFR值的30%以内的比例(P30)。非裔美国人和非裔美国人组之间的差异性能用于评估算法在种族方面的公平性。我们的研究表明,种族变量对误差、准确性和差异性能的影响可以忽略不计。此外,纳入更多相关临床特征(如慢性肾脏病的常见合并症)并使用更复杂的机器学习模型,即随机森林回归,可显著降低GFR的估计误差。然而,非裔美国患者和非非裔美国患者之间的性能差异并未减小,非裔美国患者的估计误差始终高于非非裔美国患者,这表明应发现并纳入更客观的患者特征以提高算法性能。