Zhang Yifan, Yang William, Li Dan, Yang Jack Y, Guan Renchu, Yang Mary Qu
MidSouth Bioinformatics Center and Joint Bioinformatics Ph.D. Program of University of Arkansas at Little Rock and Univ. of Arkansas Medical Sciences, 2801 S. Univ. Ave, Little Rock, 72204, USA.
Department of Computer Science, Carnegie Mellon University School of Computer Science, 5000 Forbes Ave, Pittsburgh, 24105, USA.
BMC Med Genomics. 2018 Nov 20;11(Suppl 5):104. doi: 10.1186/s12920-018-0419-x.
Breast cancer is the most common type of invasive cancer in woman. It accounts for approximately 18% of all cancer deaths worldwide. It is well known that somatic mutation plays an essential role in cancer development. Hence, we propose that a prognostic prediction model that integrates somatic mutations with gene expression can improve survival prediction for cancer patients and also be able to reveal the genetic mutations associated with survival.
Differential expression analysis was used to identify breast cancer related genes. Genetic algorithm (GA) and univariate Cox regression analysis were applied to filter out survival related genes. DAVID was used for enrichment analysis on somatic mutated gene set. The performance of survival predictors were assessed by Cox regression model and concordance index(C-index).
We investigated the genome-wide gene expression profile and somatic mutations of 1091 breast invasive carcinoma cases from The Cancer Genome Atlas (TCGA). We identified 118 genes with high hazard ratios as breast cancer survival risk gene candidates (log rank p < 0.0001 and c-index = 0.636). Multiple breast cancer survival related genes were found in this gene set, including FOXR2, FOXD1, MTNR1B and SDC1. Further genetic algorithm (GA) revealed an optimal gene set consisted of 88 genes with higher c-index (log rank p < 0.0001 and c-index = 0.656). We validated this gene set on an independent breast cancer data set and achieved a similar performance (log rank p < 0.0001 and c-index = 0.614). Moreover, we revealed 25 functional annotations, 15 gene ontology terms and 14 pathways that were significantly enriched in the genes that showed distinct mutation patterns in the different survival risk groups. These functional gene sets were used as new features for the survival prediction model. In particular, our results suggested that the Fanconi anemia pathway had an important role in breast cancer prognosis.
Our study indicated that the expression levels of the gene signatures remain the effective indicators for breast cancer survival prediction. Combining the gene expression information with other types of features derived from somatic mutations can further improve the performance of survival prediction. The pathways that were associated with survival risk suggested by our study can be further investigated for improving cancer patient survival.
乳腺癌是女性中最常见的浸润性癌症类型。它占全球所有癌症死亡人数的约18%。众所周知,体细胞突变在癌症发展中起着至关重要的作用。因此,我们提出一个将体细胞突变与基因表达相结合的预后预测模型可以改善癌症患者的生存预测,并且还能够揭示与生存相关的基因突变。
使用差异表达分析来鉴定乳腺癌相关基因。应用遗传算法(GA)和单变量Cox回归分析来筛选出生存相关基因。DAVID用于对体细胞突变基因集进行富集分析。通过Cox回归模型和一致性指数(C-index)评估生存预测指标的性能。
我们研究了来自癌症基因组图谱(TCGA)的1091例乳腺浸润性癌病例的全基因组基因表达谱和体细胞突变。我们鉴定出118个具有高风险比的基因作为乳腺癌生存风险基因候选者(对数秩p < 0.0001且C-index = 0.636)。在这个基因集中发现了多个与乳腺癌生存相关的基因,包括FOXR2、FOXD1、MTNR1B和SDC1。进一步的遗传算法(GA)揭示了一个由88个基因组成的最优基因集,其C-index更高(对数秩p < 0.0001且C-index = 0.656)。我们在一个独立的乳腺癌数据集上验证了这个基因集,并取得了相似的性能(对数秩p < 0.0001且C-index = 0.614)。此外,我们揭示了25个功能注释、15个基因本体术语和14条通路,它们在不同生存风险组中显示出不同突变模式的基因中显著富集。这些功能基因集被用作生存预测模型的新特征。特别是,我们的结果表明范可尼贫血通路在乳腺癌预后中具有重要作用。
我们的研究表明基因特征的表达水平仍然是乳腺癌生存预测的有效指标。将基因表达信息与源自体细胞突变的其他类型特征相结合可以进一步提高生存预测的性能。我们的研究提出的与生存风险相关的通路可以进一步研究以改善癌症患者的生存。