Chen Xiangning, Balko Justin M, Ling Fei, Jin Yabin, Gonzalez Anneliese, Zhao Zhongming, Chen Jingchun
410 AI, LLC, 10 Plummer Ct, Germantown, MD, 20876, USA.
Department of Medicine, Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, 2101 W End Ave, Nashville, TN, 37240, USA.
Heliyon. 2023 Mar 23;9(4):e14819. doi: 10.1016/j.heliyon.2023.e14819. eCollection 2023 Apr.
Triple negative breast cancers (TNBCs) are tumors with a poor treatment response and prognosis. In this study, we propose a new approach, candidate extraction from convolutional neural network (CNN) elements (CECE), for discovery of biomarkers for TNBCs. We used the GSE96058 and GSE81538 datasets to build a CNN model to classify TNBCs and non-TNBCs and used the model to make TNBC predictions for two additional datasets, the cancer genome atlas (TCGA) breast cancer RNA sequencing data and the data from Fudan University Shanghai Cancer Center (FUSCC). Using correctly predicted TNBCs from the GSE96058 and TCGA datasets, we calculated saliency maps for these subjects and extracted the genes that the CNN model used to separate TNBCs from non-TNBCs. Among the TNBC signature patterns that the CNN models learned from the training data, we found a set of 21 genes that can classify TNBCs into two major classes, or CECE subtypes, with distinct overall survival rates ( = 0.0074). We replicated this subtype classification in the FUSCC dataset using the same 21 genes, and the two subtypes had similar differential overall survival rates ( = 0.0490). When all TNBCs were combined from the 3 datasets, the CECE II subtype had a hazard ratio of 1.94 (95% CI, 1.25-3.01; = 0.0032). The results demonstrate that the spatial patterns learned by the CNN models can be utilized to discover interacting biomarkers otherwise unlikely to be identified by traditional approaches.
三阴性乳腺癌(TNBC)是治疗反应和预后较差的肿瘤。在本研究中,我们提出了一种新方法,即从卷积神经网络(CNN)元素中提取候选物(CECE),用于发现TNBC的生物标志物。我们使用GSE96058和GSE81538数据集构建了一个CNN模型来区分TNBC和非TNBC,并使用该模型对另外两个数据集进行TNBC预测,这两个数据集分别是癌症基因组图谱(TCGA)乳腺癌RNA测序数据和复旦大学附属肿瘤医院(FUSCC)的数据。利用GSE96058和TCGA数据集中预测正确的TNBC,我们计算了这些样本的显著性图,并提取了CNN模型用于区分TNBC和非TNBC的基因。在CNN模型从训练数据中学到的TNBC特征模式中,我们发现一组21个基因可以将TNBC分为两个主要类别,即CECE亚型,其总生存率明显不同(P = 0.0074)。我们使用相同的21个基因在FUSCC数据集中重复了这种亚型分类,并且这两个亚型具有相似的总生存率差异(P = 0.0490)。当将3个数据集中的所有TNBC合并时,CECE II亚型的风险比为1.94(95% CI,1.25 - 3.01;P = 0.0032)。结果表明,CNN模型学到的空间模式可用于发现相互作用的生物标志物,而这些生物标志物用传统方法不太可能被识别。