Lai Chun-Chi, Chen Cheng-Yu, Chang Tzu-Hao
Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, 9th Floor, 301 Yuantong Road, Zhonghe District, Taipei, Taiwan, 886 66202589 ext 10928.
Division of General Surgery, Department of Surgery, New Taipei Municipal TuCheng Hospital, New Taipei City, Taiwan.
JMIR Cancer. 2025 Jul 18;11:e64685. doi: 10.2196/64685.
Breast cancer is the most prevalent form of cancer worldwide, with 2.3 million new diagnoses in 2022. Recent advancements in treatment have led to a shift in the use of chemotherapy-targeted immunotherapy from a postoperative adjuvant to a preoperative neoadjuvant approach in select cases, resulting in enhanced survival outcomes. A pathological complete response (pCR) is a critical prognostic marker, with higher pCR rates linked to improved overall and disease-free survival.
The objective of this study was to develop robust, machine learning-based prediction models for pCR following neoadjuvant therapy, leveraging clinical, laboratory, and imaging data.
A retrospective cohort study was conducted using data from the Taipei Medical University Clinical Research Database from 2015 to 2022. Eligible patients were those with breast cancer who received neoadjuvant therapy followed by curative surgical resection. Machine learning models were developed using 3 distinct sets of variables. Model 1 included 14 clinical features such as age, height, weight, tumor stage, receptor status, tumor markers, and intrinsic subtype. Model 2 expanded on this by incorporating additional laboratory data and comorbidities (29 variables in total). Model 3 added breast sonography response data to the clinical variables in model 1. Algorithms including logistic regression, random forest, support vector machines, and extreme gradient boosting were used. Feature selection was performed using recursive feature elimination with cross-validation, and model performance was assessed using accuracy and area under the receiver operating characteristic curve (AUROC).
A total of 334 patients were analyzed, with 199 in the non-pCR group and 135 in the pCR group. The application of logistic regression with recursive feature elimination with cross-validation was found to demonstrate the optimal performance among the various algorithms that were evaluated in this study. Model 1 attained a mean accuracy of 0.66 (SD 0.02) and a mean AUROC of 0.73 (SD 0.01). The incorporation of laboratory data and comorbidities in model 2 did not yield significant enhancement, with a mean accuracy of 0.67 (SD 0.02) and a mean AUROC of 0.73 (SD 0.01). The incorporation of breast sonography response in model 3 led to a modest enhancement in predictive performance for the sonography group (accuracy 0.68; AUROC 0.60) in comparison to the nonsonography group (accuracy 0.66; AUROC 0.55). Despite the modest sample size (41 patients) of model 3, the integration of sonography data appeared to offer additional value in predicting pCR and warrants further investigation.
This study suggests that incorporating breast sonography into models with clinical and laboratory data may modestly improve pCR prediction. It is important to note that the findings of this study are preliminary and require cautious interpretation. Further studies are required to validate this approach and support its integration into a machine learning-based clinical workflow.
乳腺癌是全球最常见的癌症形式,2022年有230万例新诊断病例。治疗方面的最新进展导致在某些情况下,化疗靶向免疫疗法的使用从术后辅助治疗转向术前新辅助治疗,从而提高了生存结果。病理完全缓解(pCR)是一个关键的预后指标,较高的pCR率与总体生存率和无病生存率的提高相关。
本研究的目的是利用临床、实验室和影像数据,开发强大的基于机器学习的新辅助治疗后pCR预测模型。
使用台北医学大学临床研究数据库2015年至2022年的数据进行回顾性队列研究。符合条件的患者为接受新辅助治疗后进行根治性手术切除的乳腺癌患者。使用3组不同的变量开发机器学习模型。模型1包括14项临床特征,如年龄、身高、体重、肿瘤分期、受体状态、肿瘤标志物和内在亚型。模型2在此基础上纳入了额外的实验室数据和合并症(共29个变量)。模型3在模型1的临床变量中增加了乳腺超声反应数据。使用了包括逻辑回归、随机森林、支持向量机和极端梯度提升在内的算法。使用带有交叉验证的递归特征消除进行特征选择,并使用准确率和受试者操作特征曲线下面积(AUROC)评估模型性能。
共分析了334例患者,其中非pCR组199例,pCR组135例。在本研究评估的各种算法中,发现使用带有交叉验证的递归特征消除的逻辑回归表现出最佳性能。模型1的平均准确率为0.66(标准差0.02),平均AUROC为0.73(标准差0.01)。模型2中纳入实验室数据和合并症并没有带来显著改善,平均准确率为0.67(标准差0.02),平均AUROC为0.73(标准差`0.01)。与非超声检查组(准确率0.66;AUROC 0.55)相比,模型3中纳入乳腺超声反应使超声检查组的预测性能有适度提高(准确率0.68;AUROC 0.60)。尽管模型3的样本量较小(41例患者),但超声数据的整合似乎在预测pCR方面提供了额外价值,值得进一步研究。
本研究表明,将乳腺超声纳入临床和实验室数据模型可能会适度改善pCR预测。需要注意的是,本研究结果是初步的,需要谨慎解读。需要进一步研究来验证这种方法,并支持将其整合到基于机器学习的临床工作流程中。