Li Yan, Zhang Xinyan, Akinyemiju Tomi, Ojesina Akinyemi I, Szychowski Jeff M, Liu Nianjun, Xu Bo, Yi Nengjun
Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL 35294, USA.
Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AL 35294, USA.
J Bioinform Genom. 2016 Sep;1(1). doi: 10.18454/jbg.2016.1.1.2. Epub 2016 Sep 15.
Many traditional clinical prognostic factors have been known for cancer for years, but usually provide poor survival prediction. Genomic information is more easily available now which offers opportunities to build more accurate prognostic models. The challenge is how to integrate them to improve survival prediction. The common approach of jointly analyzing all type of covariates directly in one single model may not improve the prediction due to increased model complexity and cannot be easily applied to different datasets.
We proposed a two-stage procedure to better combine different sources of information for survival prediction, and applied the two-stage procedure in two cancer datasets: myelodysplastic syndromes (MDS) and ovarian cancer. Our analysis suggests that the prediction performance of different data types are very different, and combining clinical, gene expression and mutation data using the two-stage procedure improves survival prediction in terms of improved concordance index and reduced prediction error.
The two-stage procedure can be implemented in BhGLM package which is freely available at http://www.ssg.uab.edu/bhglm/.
多年来,许多传统的临床预后因素已为人所知,但通常对癌症生存的预测效果不佳。现在基因组信息更容易获取,这为构建更准确的预后模型提供了机会。挑战在于如何整合这些信息以改善生存预测。在一个单一模型中直接联合分析所有类型协变量的常见方法,可能由于模型复杂性增加而无法改善预测,并且不易应用于不同的数据集。
我们提出了一种两阶段程序,以更好地结合不同来源的信息进行生存预测,并将该两阶段程序应用于两个癌症数据集:骨髓增生异常综合征(MDS)和卵巢癌。我们的分析表明,不同数据类型的预测性能差异很大,使用两阶段程序结合临床、基因表达和突变数据,在提高一致性指数和降低预测误差方面改善了生存预测。
两阶段程序可在BhGLM软件包中实现,该软件包可从http://www.ssg.uab.edu/bhglm/免费获取。