Han Yiqun, Wang Jiayu, Xu Binghe
Department of Medical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College. No. 17, Panjiayuan Nanli, Chaoyang District, Beijing 100021, China.
J Cancer. 2021 Jan 1;12(3):936-945. doi: 10.7150/jca.52439. eCollection 2021.
To develop and validate a prediction model for the pathological complete response (pCR) to neoadjuvant chemotherapy (NCT) of triple-negative breast cancer (TNBC). We systematically searched Gene Expression Omnibus, ArrayExpress, and PubMed for the gene expression profiles of operable TNBC accessible to NCT. Molecular heterogeneity was detected with hierarchical clustering method, and the biological profiles of differentially expressed genes were investigated by Gene Ontology, Kyoto Encyclopedia of Genes and Genomes analyses, and Gene Set Enrichment Analysis (GSEA). Next, machine-learning algorithms including random-forest analysis and least absolute shrinkage and selection operator (LASSO) analysis were synchronously performed and, then, the intersected proportion of significant genes was undergone binary logistic regression to fulfill variables selection. The predictive response score (pRS) system was built as the product of the gene expression and coefficient obtained from the logistic analysis. Last, the cohorts were randomly divided in a 7:3 ratio into training cohort and validation cohort for the introduction of a robust model, and a nomogram was constructed with the independent predictors for pCR rate. A total of 217 individuals from four cohort datasets (GSE32646, GSE25065, GSE25055, GSE21974) with complete clinicopathological information were included. Based on the microarray data, a six-gene panel (ATP4B, FBXO22, FCN2, RRP8, SMERK2, TET3) was identified. A robust nomogram, adopting pRS and clinical tumor size stage, was established and the performance was successively validated by calibration curves and receiver operating characteristic curves with the area under curve 0.704 and 0.756, respectively. Results of GSEA revealed that the biological processes including apoptosis, hypoxia, mTORC1 signaling and myogenesis, and oncogenic features of EGFR and RAF were in proactivity to attribute to an inferior response. This study provided a robust prediction model for pCR rate and revealed potential mechanisms of distinct response to NCT in TNBC, which were promising and warranted to further validate in the perspective.
开发并验证三阴性乳腺癌(TNBC)新辅助化疗(NCT)病理完全缓解(pCR)的预测模型。我们系统检索了基因表达综合数据库(Gene Expression Omnibus)、ArrayExpress数据库以及PubMed,以获取可接受NCT的可手术TNBC的基因表达谱。采用层次聚类方法检测分子异质性,并通过基因本体论(Gene Ontology)、京都基因与基因组百科全书(Kyoto Encyclopedia of Genes and Genomes)分析以及基因集富集分析(Gene Set Enrichment Analysis,GSEA)研究差异表达基因的生物学特征。接下来,同步执行包括随机森林分析和最小绝对收缩和选择算子(LASSO)分析在内的机器学习算法,然后,对显著基因的交集比例进行二元逻辑回归以完成变量选择。预测反应评分(pRS)系统构建为基因表达与逻辑分析获得的系数的乘积。最后,将队列以7:3的比例随机分为训练队列和验证队列以引入稳健模型,并使用pCR率的独立预测因子构建列线图。纳入了来自四个队列数据集(GSE32646、GSE25065、GSE25055、GSE21974)的217例具有完整临床病理信息的个体。基于微阵列数据,确定了一个六基因panel(ATP4B、FBXO22、FCN2、RRP8、SMERK2,、TET3)。建立了一个采用pRS和临床肿瘤大小分期的稳健列线图,并分别通过校准曲线和受试者工作特征曲线对其性能进行了验证,曲线下面积分别为0.704和0.756。GSEA结果显示,包括细胞凋亡、缺氧、mTORC1信号传导和肌生成在内的生物学过程以及EGFR和RAF的致癌特征与较差的反应相关。本研究为pCR率提供了一个稳健的预测模型,并揭示了TNBC对NCT不同反应的潜在机制,这很有前景,值得进一步深入验证。