Johnson Heather, Ali Amjad, Zhang Xuhui, Wang Tianyan, Simoulis Athanasios, Wingren Anette Gjörloff, Persson Jenny L
Olympia Diagnostics, Inc., Sunnyvale, CA 94086, USA.
Department of Molecular Biology, Umeå University, SE-901 87 Umeå, Sweden.
Cancers (Basel). 2022 Oct 28;14(21):5322. doi: 10.3390/cancers14215322.
Purpose: There is an urgent need for developing new biomarker tools to accurately predict treatment response of breast cancer, especially the deadly triple-negative breast cancer. We aimed to develop gene-mutation-based machine learning (ML) algorithms as biomarker classifiers to predict treatment response of first-line chemotherapy with high precision. Methods: Random Forest ML was applied to screen the algorithms of various combinations of gene mutation profiles of primary tumors at diagnosis using a TCGA Cohort (n = 399) with up to 150 months follow-up as a training set and validated in a MSK Cohort (n = 807) with up to 220 months follow-up. Subtypes of breast cancer including triple-negative and luminal A (ER+, PR+ and HER2−) were also assessed. The predictive performance of the candidate algorithms as classifiers was further assessed using logistic regression, Kaplan−Meier progression-free survival (PFS) plot, and univariate/multivariate Cox proportional hazard regression analyses. Results: A novel algorithm termed the 12-Gene Algorithm based on mutation profiles of KRAS, PIK3CA, MAP3K1, MAP2K4, PTEN, TP53, CDH1, GATA3, KMT2C, ARID1A, RunX1, and ESR1, was identified. The performance of this algorithm to distinguish non-progressed (responder) vs. progressed (non-responder) to treatment in the TCGA Cohort as determined using AUC was 0.96 (95% CI 0.94−0.98). It predicted progression-free survival (PFS) with hazard ratio (HR) of 21.6 (95% CI 11.3−41.5) (p < 0.0001) in all patients. The algorithm predicted PFS in the triple-negative subgroup with HR of 19.3 (95% CI 3.7−101.3) (n = 42, p = 0.000). The 12-Gene Algorithm was validated in the MSK Cohort with a similar AUC of 0.97 (95% CI 0.96−0.98) to distinguish responder vs. non-responder patients, and had a HR of 18.6 (95% CI 4.4−79.2) to predict PFS in the triple-negative subgroup (n = 75, p < 0.0001). Conclusions: The novel 12-Gene algorithm based on multitude gene-mutation profiles identified through ML has a potential to predict breast cancer treatment response to therapies, especially in triple-negative subgroups patients, which may assist personalized therapies and reduce mortality.
迫切需要开发新的生物标志物工具,以准确预测乳腺癌尤其是致命的三阴性乳腺癌的治疗反应。我们旨在开发基于基因突变的机器学习(ML)算法作为生物标志物分类器,以高精度预测一线化疗的治疗反应。方法:应用随机森林ML算法,以一个随访时间长达150个月的TCGA队列(n = 399)作为训练集,筛选诊断时原发性肿瘤基因突变谱的各种组合算法,并在一个随访时间长达220个月的MSK队列(n = 807)中进行验证。还评估了包括三阴性和腔面A型(ER +、PR +和HER2−)在内的乳腺癌亚型。使用逻辑回归、Kaplan-Meier无进展生存期(PFS)图以及单变量/多变量Cox比例风险回归分析,进一步评估候选算法作为分类器的预测性能。结果:基于KRAS、PIK3CA、MAP3K1、MAP2K4、PTEN、TP53、CDH1、GATA3、KMT2C、ARID1A、RunX1和ESR1基因突变谱,确定了一种名为12基因算法的新算法。在TCGA队列中,该算法使用AUC确定区分治疗无进展(反应者)与进展(无反应者)的性能为0.96(95%CI 0.94−0.98)。在所有患者中,其预测无进展生存期(PFS)的风险比(HR)为21.6(95%CI 11.3−41.5)(p < 0.0001)。该算法在三阴性亚组中预测PFS的HR为19.3(95%CI 3.7−101.3)(n = 42,p = 0.000)。12基因算法在MSK队列中得到验证,区分反应者与无反应者患者的AUC相似,为0.97(95%CI 0.96−0.98),在三阴性亚组中预测PFS的HR为18.6(95%CI 4.4−79.2)(n = 75,p < 0.0001)。结论:基于通过ML识别的多种基因突变谱的新型12基因算法有潜力预测乳腺癌对治疗的反应,尤其是在三阴性亚组患者中,这可能有助于个性化治疗并降低死亡率。