School of Public Health, Yale University, USA.
Brief Bioinform. 2010 Jul;11(4):385-93. doi: 10.1093/bib/bbp070. Epub 2010 Feb 1.
Development of high-throughput technologies makes it possible to survey the whole genome. Genomic studies have been extensively conducted, searching for markers with predictive power for prognosis of complex diseases such as cancer, diabetes and obesity. Most existing statistical analyses are focused on developing marker selection techniques, while little attention is paid to the underlying prognosis models. In this article, we review three commonly used prognosis models, namely the Cox, additive risk and accelerated failure time models. We conduct simulation and show that gene identification can be unsatisfactory under model misspecification. We analyze three cancer prognosis studies under the three models, and show that the gene identification results, prediction performance of all identified genes combined, and reproducibility of each identified gene are model-dependent. We suggest that in practical data analysis, more attention should be paid to the model assumption, and multiple models may need to be considered.
高通量技术的发展使得对整个基因组进行调查成为可能。已经广泛开展了基因组研究,以寻找对癌症、糖尿病和肥胖等复杂疾病具有预测能力的标记物。大多数现有的统计分析都集中在开发标记物选择技术上,而很少关注潜在的预后模型。在本文中,我们回顾了三种常用的预后模型,即 Cox、加性风险和加速失效时间模型。我们进行了模拟,结果表明在模型误设下,基因识别可能不理想。我们在这三种模型下分析了三个癌症预后研究,结果表明,在模型依赖的情况下,基因识别结果、所有识别基因的组合预测性能和每个识别基因的重现性都不同。我们建议在实际数据分析中,应该更加关注模型假设,可能需要考虑多个模型。