Huang Yuhong, Wei Lihong, Hu Yalan, Shao Nan, Lin Yingyu, He Shaofu, Shi Huijuan, Zhang Xiaoling, Lin Ying
Breast Disease Center, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China.
Department of Pathology, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China.
Front Oncol. 2021 Aug 18;11:706733. doi: 10.3389/fonc.2021.706733. eCollection 2021.
To investigate whether radiomics features extracted from multi-parametric MRI combining machine learning approach can predict molecular subtype and androgen receptor (AR) expression of breast cancer in a non-invasive way.
Patients diagnosed with clinical T2-4 stage breast cancer from March 2016 to July 2020 were retrospectively enrolled. The molecular subtypes and AR expression in pre-treatment biopsy specimens were assessed. A total of 4,198 radiomics features were extracted from the pre-biopsy multi-parametric MRI (including dynamic contrast-enhancement T1-weighted images, fat-suppressed T2-weighted images, and apparent diffusion coefficient map) of each patient. We applied several feature selection strategies including the least absolute shrinkage and selection operator (LASSO), and recursive feature elimination (RFE), the maximum relevance minimum redundancy (mRMR), Boruta and Pearson correlation analysis, to select the most optimal features. We then built 120 diagnostic models using distinct classification algorithms and feature sets divided by MRI sequences and selection strategies to predict molecular subtype and AR expression of breast cancer in the testing dataset of leave-one-out cross-validation (LOOCV). The performances of binary classification models were assessed the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). And the performances of multiclass classification models were assessed AUC, overall accuracy, precision, recall rate, and F1-score.
A total of 162 patients (mean age, 46.91 ± 10.08 years) were enrolled in this study; 30 were low-AR expression and 132 were high-AR expression. HR+/HER2- cancers were diagnosed in 56 cases (34.6%), HER2+ cancers in 81 cases (50.0%), and TNBC in 25 patients (15.4%). There was no significant difference in clinicopathologic characteristics between low-AR and high-AR groups (P > 0.05), except the menopausal status, ER, PR, HER2, and Ki-67 index (P = 0.043, <0.001, <0.001, 0.015, and 0.006, respectively). No significant difference in clinicopathologic characteristics was observed among three molecular subtypes except the AR status and Ki-67 (P = <0.001 and 0.012, respectively). The Multilayer Perceptron (MLP) showed the best performance in discriminating AR expression, with an AUC of 0.907 and an accuracy of 85.8% in the testing dataset. The highest performances were obtained for discriminating TNBC non-TNBC (AUC: 0.965, accuracy: 92.6%), HER2+ HER2- (AUC: 0.840, accuracy: 79.0%), and HR+/HER2- others (AUC: 0.860, accuracy: 82.1%) using MLP as well. The micro-AUC of MLP multiclass classification model was 0.896, and the overall accuracy was 0.735.
Multi-parametric MRI-based radiomics combining with machine learning approaches provide a promising method to predict the molecular subtype and AR expression of breast cancer non-invasively.
探讨从多参数磁共振成像(MRI)中提取的影像组学特征结合机器学习方法能否以非侵入性方式预测乳腺癌的分子亚型和雄激素受体(AR)表达。
回顾性纳入2016年3月至2020年7月诊断为临床T2 - 4期乳腺癌的患者。评估治疗前活检标本中的分子亚型和AR表达。从每位患者活检前的多参数MRI(包括动态对比增强T1加权图像、脂肪抑制T2加权图像和表观扩散系数图)中提取总共4198个影像组学特征。我们应用了几种特征选择策略,包括最小绝对收缩和选择算子(LASSO)、递归特征消除(RFE)、最大相关最小冗余(mRMR)、Boruta和Pearson相关分析,以选择最优特征。然后,我们使用不同的分类算法和按MRI序列及选择策略划分的特征集构建120个诊断模型,以预测留一法交叉验证(LOOCV)测试数据集中乳腺癌的分子亚型和AR表达。通过受试者操作特征曲线下面积(AUC)、准确性、敏感性、特异性、阳性预测值(PPV)和阴性预测值(NPV)评估二元分类模型的性能。通过AUC、总体准确性、精确率、召回率和F1分数评估多类分类模型的性能。
本研究共纳入162例患者(平均年龄46.91±10.08岁);30例为低AR表达,132例为高AR表达。HR + /HER2 - 癌56例(34.6%),HER2 + 癌81例(50.0%),三阴性乳腺癌(TNBC)25例(15.4%)。低AR组和高AR组之间除绝经状态、雌激素受体(ER)、孕激素受体(PR)、HER2和Ki - 67指数外,临床病理特征无显著差异(P分别为0.043、<0.001、<0.001、0.015和 <0.006)。除AR状态和Ki - 67外,三种分子亚型之间临床病理特征无显著差异(P分别为<0.001和0.012)。多层感知器(MLP)在区分AR表达方面表现最佳,测试数据集中AUC为0.907,准确性为85.8%。使用MLP区分TNBC与非TNBC(AUC:0.965,准确性:92.6%)、HER2 + 与HER2 - (AUC:0.840,准确性:79.0%)以及HR + /HER2 - 与其他类型(AUC:0.860,准确性:82.1%)时也获得了最高性能。MLP多类分类模型的微AUC为0.896,总体准确性为0.735。
基于多参数MRI 的影像组学结合机器学习方法为非侵入性预测乳腺癌的分子亚型和AR表达提供了一种有前景的方法。