Department of Information Technology and Electrical Engineering, ETH Zürich, Zürich, Switzerland; Department of Medicine (Biomedical Informatics), Stanford University, Stanford, USA.
Department of Medicine (Biomedical Informatics), Stanford University, Stanford, USA; Clinical AI Implementation and Research Lab, Leiden University Medical Centre, Leiden, the Netherlands.
EBioMedicine. 2023 Jun;92:104632. doi: 10.1016/j.ebiom.2023.104632. Epub 2023 Jun 1.
Machine learning (ML) predictions are becoming increasingly integrated into medical practice. One commonly used method, ℓ-penalised logistic regression (LASSO), can estimate patient risk for disease outcomes but is limited by only providing point estimates. Instead, Bayesian logistic LASSO regression (BLLR) models provide distributions for risk predictions, giving clinicians a better understanding of predictive uncertainty, but they are not commonly implemented.
This study evaluates the predictive performance of different BLLRs compared to standard logistic LASSO regression, using real-world, high-dimensional, structured electronic health record (EHR) data from cancer patients initiating chemotherapy at a comprehensive cancer centre. Multiple BLLR models were compared against a LASSO model using an 80-20 random split using 10-fold cross-validation to predict the risk of acute care utilization (ACU) after starting chemotherapy.
This study included 8439 patients. The LASSO model predicted ACU with an area under the receiver operating characteristic curve (AUROC) of 0.806 (95% CI: 0.775-0.834). BLLR with a Horseshoe+ prior and a posterior approximated by Metropolis-Hastings sampling showed similar performance: 0.807 (95% CI: 0.780-0.834) and offers the advantage of uncertainty estimation for each prediction. In addition, BLLR could identify predictions too uncertain to be automatically classified. BLLR uncertainties were stratified by different patient subgroups, demonstrating that predictive uncertainties significantly differ across race, cancer type, and stage.
BLLRs are a promising yet underutilised tool that increases explainability by providing risk estimates while offering a similar level of performance to standard LASSO-based models. Additionally, these models can identify patient subgroups with higher uncertainty, which can augment clinical decision-making.
This work was supported in part by the National Library Of Medicine of the National Institutes of Health under Award Number R01LM013362. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
机器学习(ML)预测越来越多地融入医学实践。一种常用的方法,ℓ-惩罚逻辑回归(LASSO),可以估计患者疾病结果的风险,但仅提供点估计的限制。相反,贝叶斯逻辑 LASSO 回归(BLLR)模型为风险预测提供分布,使临床医生更好地了解预测不确定性,但它们并不常用。
本研究使用来自综合癌症中心开始化疗的癌症患者的真实、高维、结构化电子健康记录(EHR)数据,评估不同 BLLR 与标准逻辑 LASSO 回归的预测性能。使用 80-20 随机分割和 10 倍交叉验证,使用多个 BLLR 模型与 LASSO 模型进行比较,以预测开始化疗后急性护理利用(ACU)的风险。
本研究包括 8439 名患者。LASSO 模型预测 ACU 的曲线下接收者操作特征面积(AUROC)为 0.806(95%CI:0.775-0.834)。具有马蹄形+先验和由 Metropolis-Hastings 采样近似的后验的 BLLR 表现相似:0.807(95%CI:0.780-0.834),并提供每个预测的不确定性估计的优势。此外,BLLR 可以识别出不确定性太大而无法自动分类的预测。BLLR 不确定性按不同的患者亚组分层,表明预测不确定性在种族、癌症类型和阶段之间存在显著差异。
BLLR 是一种有前途但未充分利用的工具,它通过提供风险估计来提高可解释性,同时提供与标准基于 LASSO 的模型相似的性能。此外,这些模型可以识别具有更高不确定性的患者亚组,这可以增强临床决策。
这项工作得到了美国国立卫生研究院国家医学图书馆的部分支持,资助号为 R01LM013362。内容仅由作者负责,不一定代表美国国立卫生研究院的官方观点。