Kaiser Permanente Washington Health Research Institute, Kaiser Permanente WA, Seattle, Washington.
Department of Radiology, University of Washington and Seattle Cancer Care Alliance, Seattle, Washington.
Cancer Epidemiol Biomarkers Prev. 2023 Apr 3;32(4):561-571. doi: 10.1158/1055-9965.EPI-22-0677.
BACKGROUND: Machine learning (ML) approaches facilitate risk prediction model development using high-dimensional predictors and higher-order interactions at the cost of model interpretability and transparency. We compared the relative predictive performance of statistical and ML models to guide modeling strategy selection for surveillance mammography outcomes in women with a personal history of breast cancer (PHBC). METHODS: We cross-validated seven risk prediction models for two surveillance outcomes, failure (breast cancer within 12 months of a negative surveillance mammogram) and benefit (surveillance-detected breast cancer). We included 9,447 mammograms (495 failures, 1,414 benefits, and 7,538 nonevents) from years 1996 to 2017 using a 1:4 matched case-control samples of women with PHBC in the Breast Cancer Surveillance Consortium. We assessed model performance of conventional regression, regularized regressions (LASSO and elastic-net), and ML methods (random forests and gradient boosting machines) by evaluating their calibration and, among well-calibrated models, comparing the area under the receiver operating characteristic curve (AUC) and 95% confidence intervals (CI). RESULTS: LASSO and elastic-net consistently provided well-calibrated predicted risks for surveillance failure and benefit. The AUCs of LASSO and elastic-net were both 0.63 (95% CI, 0.60-0.66) for surveillance failure and 0.66 (95% CI, 0.64-0.68) for surveillance benefit, the highest among well-calibrated models. CONCLUSIONS: For predicting breast cancer surveillance mammography outcomes, regularized regression outperformed other modeling approaches and balanced the trade-off between model flexibility and interpretability. IMPACT: Regularized regression may be preferred for developing risk prediction models in other contexts with rare outcomes, similar training sample sizes, and low-dimensional features.
背景:机器学习(ML)方法通过使用高维预测因子和高阶交互作用来促进风险预测模型的开发,但代价是模型的可解释性和透明度降低。我们比较了统计和 ML 模型的相对预测性能,以指导具有乳腺癌个人史(PHBC)的女性进行监测乳房 X 线照片结果的建模策略选择。
方法:我们使用 1996 年至 2017 年期间乳腺癌监测联盟中 PHBC 女性的 1:4 匹配病例对照样本,对两种监测结果(失败[阴性监测乳房 X 线照片后 12 个月内发生乳腺癌]和获益[监测发现的乳腺癌])的七个风险预测模型进行了交叉验证。我们纳入了 9447 例乳房 X 线照片(495 例失败,1414 例获益和 7538 例无事件)。我们评估了常规回归、正则化回归(LASSO 和弹性网络)和 ML 方法(随机森林和梯度提升机)的模型性能,方法是评估其校准情况,并在具有良好校准的模型中比较接受者操作特征曲线(ROC)下的面积(AUC)和 95%置信区间(CI)。
结果:LASSO 和弹性网络一致地为监测失败和获益提供了校准良好的预测风险。LASSO 和弹性网络的 AUC 对于监测失败均为 0.63(95%CI,0.60-0.66),对于监测获益为 0.66(95%CI,0.64-0.68),在具有良好校准的模型中均为最高。
结论:对于预测乳腺癌监测乳房 X 线照片结果,正则化回归优于其他建模方法,并在模型灵活性和可解释性之间取得了平衡。
影响:在具有罕见结局、相似训练样本量和低维特征的其他情况下,正则化回归可能更适合开发风险预测模型。
Cancer Epidemiol Biomarkers Prev. 2023-4-3
J Natl Cancer Inst. 2024-6-7
Int J Med Inform. 2020-6-13
J Natl Cancer Inst. 2020-5-1
J Gen Intern Med. 2019-8-13
Int J Med Inform. 2021-5
J Natl Cancer Inst. 2024-6-7
Cancer Epidemiol Biomarkers Prev. 2023-11-1
J Natl Cancer Inst. 2021-11-29
N Engl J Med. 2021-3-25
Curr Epidemiol Rep. 2020-6
J Natl Compr Canc Netw. 2020-12-2
N Engl J Med. 2020-8-27
Cancer Epidemiol Biomarkers Prev. 2020-10