Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education), Laboratory of Molecular Oncology, Peking University Cancer Hospital & Institute, Beijing, China.
Department of Public Health Sciences, University of Chicago, Chicago, IL, USA.
Breast Cancer Res. 2024 Oct 29;26(1):148. doi: 10.1186/s13058-024-01905-7.
BACKGROUND: For patients with breast cancer undergoing neoadjuvant chemotherapy (NACT), most of the existing prediction models of pathologic complete response (pCR) using clinicopathological features were based on standard statistical models like logistic regression, while models based on machine learning mostly utilized imaging data and/or gene expression data. This study aims to develop a robust and accessible machine learning model to predict pCR using clinicopathological features alone, which can be used to facilitate clinical decision-making in diverse settings. METHODS: The model was developed and validated within the National Cancer Data Base (NCDB, 2018-2020) and an external cohort at the University of Chicago (2010-2020). We compared logistic regression and machine learning models, and examined whether incorporating quantitative clinicopathological features improved model performance. Decision curve analysis was conducted to assess the model's clinical utility. RESULTS: We identified 56,209 NCDB patients receiving NACT (pCR rate: 34.0%). The machine learning model incorporating quantitative clinicopathological features showed the best discrimination performance among all the fitted models [area under the receiver operating characteristic curve (AUC): 0.785, 95% confidence interval (CI): 0.778-0.792], along with outstanding calibration performance. The model performed best among patients with hormone receptor positive/human epidermal growth factor receptor 2 negative (HR+/HER2-) breast cancer (AUC: 0.817, 95% CI: 0.802-0.832); and by adopting a 7% prediction threshold, the model achieved 90.5% sensitivity and 48.8% specificity, with decision curve analysis finding a 23.1% net reduction in chemotherapy use. In the external testing set of 584 patients (pCR rate: 33.4%), the model maintained robust performance both overall (AUC: 0.711, 95% CI: 0.668-0.753) and in the HR+/HER2- subgroup (AUC: 0.810, 95% CI: 0.742-0.878). CONCLUSIONS: The study developed a machine learning model ( https://huolab.cri.uchicago.edu/sample-apps/pcrmodel ) to predict pCR in breast cancer patients undergoing NACT that demonstrated robust discrimination and calibration performance. The model performed particularly well among patients with HR+/HER2- breast cancer, having the potential to identify patients who are less likely to achieve pCR and can consider alternative treatment strategies over chemotherapy. The model can also serve as a robust baseline model that can be integrated with smaller datasets containing additional granular features in future research.
背景:对于接受新辅助化疗(NACT)的乳腺癌患者,大多数使用临床病理特征的病理完全缓解(pCR)预测模型都是基于逻辑回归等标准统计模型,而基于机器学习的模型主要利用影像学数据和/或基因表达数据。本研究旨在开发一个稳健且易于使用的机器学习模型,仅使用临床病理特征预测 pCR,可用于在各种环境中辅助临床决策。
方法:该模型在国家癌症数据库(NCDB,2018-2020 年)和芝加哥大学的一个外部队列(2010-2020 年)中进行了开发和验证。我们比较了逻辑回归和机器学习模型,并检验了纳入定量临床病理特征是否能提高模型性能。通过决策曲线分析评估模型的临床实用性。
结果:我们在 NCDB 中识别了 56209 名接受 NACT 的患者(pCR 率:34.0%)。纳入定量临床病理特征的机器学习模型在所有拟合模型中显示出最佳的区分性能[受试者工作特征曲线下面积(AUC):0.785,95%置信区间(CI):0.778-0.792],同时具有出色的校准性能。该模型在激素受体阳性/人表皮生长因子受体 2 阴性(HR+/HER2-)乳腺癌患者中表现最佳(AUC:0.817,95%CI:0.802-0.832);通过采用 7%的预测阈值,该模型实现了 90.5%的敏感性和 48.8%的特异性,决策曲线分析发现化疗使用率降低了 23.1%。在 584 名患者的外部测试集中(pCR 率:33.4%),该模型在整体和 HR+/HER2-亚组中均表现出稳健的性能(AUC:0.711,95%CI:0.668-0.753;AUC:0.810,95%CI:0.742-0.878)。
结论:本研究开发了一种机器学习模型(https://huolab.cri.uchicago.edu/sample-apps/pcrmodel),用于预测接受 NACT 的乳腺癌患者的 pCR,该模型表现出良好的区分度和校准性能。该模型在 HR+/HER2-乳腺癌患者中表现尤其出色,有可能识别出不太可能实现 pCR 的患者,并考虑替代化疗的治疗策略。该模型还可以作为一个稳健的基线模型,可在未来研究中与包含更多粒度特征的较小数据集进行整合。
JCO Clin Cancer Inform. 2024-11
Ecancermedicalscience. 2022-8-30
Breast Cancer Res Treat. 2022-11