Zhang Xinyan, Li Yan, Akinyemiju Tomi, Ojesina Akinyemi I, Buckhaults Phillip, Liu Nianjun, Xu Bo, Yi Nengjun
Department of Biostatistics, University of Alabama at Birmingham, Alabama 35294.
Department of Epidemiology, University of Alabama at Birmingham, Alabama 35294.
Genetics. 2017 Jan;205(1):89-100. doi: 10.1534/genetics.116.189191. Epub 2016 Nov 9.
Heterogeneity in terms of tumor characteristics, prognosis, and survival among cancer patients has been a persistent problem for many decades. Currently, prognosis and outcome predictions are made based on clinical factors and/or by incorporating molecular profiling data. However, inaccurate prognosis and prediction may result by using only clinical or molecular information directly. One of the main shortcomings of past studies is the failure to incorporate prior biological information into the predictive model, given strong evidence of the pathway-based genetic nature of cancer, i.e., the potential for oncogenes to be grouped into pathways based on biological functions such as cell survival, proliferation, and metastatic dissemination. To address this problem, we propose a two-stage approach to incorporate pathway information into the prognostic modeling using large-scale gene expression data. In the first stage, we fit all predictors within each pathway using the penalized Cox model and Bayesian hierarchical Cox model. In the second stage, we combine the cross-validated prognostic scores of all pathways obtained in the first stage as new predictors to build an integrated prognostic model for prediction. We apply the proposed method to analyze two independent breast and ovarian cancer datasets from The Cancer Genome Atlas (TCGA), predicting overall survival using large-scale gene expression profiling data. The results from both datasets show that the proposed approach not only improves survival prediction compared with the alternative analyses that ignore the pathway information, but also identifies significant biological pathways.
几十年来,癌症患者在肿瘤特征、预后和生存率方面的异质性一直是个持续存在的问题。目前,预后和结局预测是基于临床因素和/或通过纳入分子谱数据来进行的。然而,仅直接使用临床或分子信息可能会导致预后和预测不准确。过去研究的主要缺点之一是未能将先前的生物学信息纳入预测模型,鉴于有强有力的证据表明癌症具有基于通路的遗传本质,即癌基因有可能根据细胞存活、增殖和转移扩散等生物学功能被归类到通路中。为了解决这个问题,我们提出一种两阶段方法,利用大规模基因表达数据将通路信息纳入预后建模。在第一阶段,我们使用惩罚Cox模型和贝叶斯分层Cox模型对每个通路内的所有预测因子进行拟合。在第二阶段,我们将在第一阶段获得的所有通路的交叉验证预后分数作为新的预测因子进行组合,以构建一个用于预测的综合预后模型。我们应用所提出的方法分析来自癌症基因组图谱(TCGA)的两个独立的乳腺癌和卵巢癌数据集,使用大规模基因表达谱数据预测总生存期。两个数据集的结果均表明,与忽略通路信息的其他分析相比,所提出的方法不仅改善了生存预测,还识别出了重要的生物学通路。