Chen Yongxin, Chen Siyi, Tang Wenjie, Kong Qingcong, Zhong Zhidan, Yu Xiaomeng, Sui Yi, Hu Wenke, Jiang Xinqing, Guo Yuan
Department of Radiology, Guangzhou First People's Hospital, No. 1 Panfu Rd, Guangzhou, 510180 China.
Department of Radiology, The Third Affiliated Hospital, Sun Yat-sen University, Guangzhou, China.
AJR Am J Roentgenol. 2025 Jan;224(1):e2431717. doi: 10.2214/AJR.24.31717. Epub 2024 Oct 16.
MRI radiomics has been explored for three-tiered classification of HER2 expression levels (i.e., HER2-zero, HER2-low, or HER2-positive) in patients with breast cancer, although an understanding of how such models reach their predictions is lacking. The purpose of this study was to develop and test multiparametric MRI radiomics machine learning models for differentiating three-tiered HER2 expression levels in patients with breast cancer, as well as to explain the contributions of model features through local and global interpretations with the use of Shapley additive explanation (SHAP) analysis. This retrospective study included 737 patients (mean age, 54.1 ± 10.6 [SD] years) with breast cancer from two centers (center 1 [ = 578] and center 2 [ = 159]), all of whom underwent multiparametric breast MRI and had HER2 expression determined after excisional biopsy. Analysis entailed two tasks: differentiating HER2-negative (i.e., HER2-zero or HER2-low) tumors from HER2-positive tumors (task 1) and differentiating HER2-zero tumors from HER2-low tumors (task 2). For each task, patients from center 1 were randomly assigned in a 7:3 ratio to a training set (task 1: = 405; task 2: = 284) or an internal test set (task 1: = 173; task 2: = 122); patients from center 2 formed an external test set (task 1: = 159; task 2: = 105). Radiomic features were extracted from early phase dynamic contrast-enhanced (DCE) imaging, T2-weighted imaging, and DWI. For each task, a support vector machine (SVM) was used for feature selection, a multiparametric radiomics score (radscore) was computed using feature weights from SVM correlation coefficients, conventional MRI and combined models were constructed, and model performances were evaluated. SHAP analysis was used to provide local and global interpretations of the model outputs. In the external test set, for task 1, AUCs for the conventional MRI model, radscore, and the combined model were 0.624, 0.757, and 0.762, respectively; for task 2, the AUC for radscore was 0.754, and no conventional MRI model or combined model could be constructed. SHAP analysis identified early phase DCE imaging features as having the strongest influence for both tasks; T2-weighted imaging features also had a prominent role for task 2. The findings indicate suboptimal performance of MRI radiomics models for noninvasive characterization of HER2 expression. The study provides an example of the use of SHAP interpretation analysis to better understand predictions of imaging-based machine learning models.
磁共振成像(MRI)影像组学已被用于探索乳腺癌患者HER2表达水平的三层分类(即HER2阴性、HER2低表达或HER2阳性),尽管目前尚不清楚此类模型是如何做出预测的。本研究的目的是开发并测试多参数MRI影像组学机器学习模型,以区分乳腺癌患者的HER2表达水平的三个层次,并通过使用Shapley加性解释(SHAP)分析进行局部和全局解释,说明模型特征的贡献。这项回顾性研究纳入了来自两个中心(中心1 [n = 578]和中心2 [n = 159])的737例乳腺癌患者(平均年龄54.1±10.6 [标准差]岁),所有患者均接受了多参数乳腺MRI检查,并在切除活检后确定了HER2表达。分析包括两项任务:区分HER2阴性(即HER2阴性或HER2低表达)肿瘤与HER2阳性肿瘤(任务1),以及区分HER2阴性肿瘤与HER2低表达肿瘤(任务2)。对于每项任务,中心1的患者按7:3的比例随机分配到训练集(任务1:n = 405;任务2:n = 284)或内部测试集(任务1:n = 173;任务2:n = 122);中心2的患者组成外部测试集(任务1:n = 159;任务2:n = 105)。从早期动态对比增强(DCE)成像、T2加权成像和扩散加权成像(DWI)中提取影像组学特征。对于每项任务,使用支持向量机(SVM)进行特征选择,使用SVM相关系数的特征权重计算多参数影像组学评分(radscore),构建传统MRI模型和联合模型,并评估模型性能。使用SHAP分析对模型输出进行局部和全局解释。在外部测试集中,对于任务1,传统MRI模型、radscore和联合模型的曲线下面积(AUC)分别为0.624、0.757和0.762;对于任务2,radscore的AUC为0.754,无法构建传统MRI模型或联合模型。SHAP分析确定早期DCE成像特征对两项任务的影响最大;T2加权成像特征在任务2中也起重要作用。研究结果表明,MRI影像组学模型在HER2表达的无创特征化方面表现欠佳。该研究提供了一个使用SHAP解释分析来更好地理解基于影像的机器学习模型预测的例子。