Cai Tianxi, Tonini Giulia, Lin Xihong
Department of Biostatistics, Harvard University, 655 Huntington Avenue, Boston, Massachusetts 02115, USA.
Biometrics. 2011 Sep;67(3):975-86. doi: 10.1111/j.1541-0420.2010.01544.x. Epub 2011 Jan 31.
There is growing evidence that genomic and proteomic research holds great potential for changing irrevocably the practice of medicine. The ability to identify important genomic and biological markers for risk assessment can have a great impact in public health from disease prevention, to detection, to treatment selection. However, the potentially large number of markers and the complexity in the relationship between the markers and the outcome of interest impose a grand challenge in developing accurate risk prediction models. The standard approach to identifying important markers often assesses the marginal effects of individual markers on a phenotype of interest. When multiple markers relate to the phenotype simultaneously via a complex structure, such a type of marginal analysis may not be effective. To overcome such difficulties, we employ a kernel machine Cox regression framework and propose an efficient score test to assess the overall effect of a set of markers, such as genes within a pathway or a network, on survival outcomes. The proposed test has the advantage of capturing the potentially nonlinear effects without explicitly specifying a particular nonlinear functional form. To approximate the null distribution of the score statistic, we propose a simple resampling procedure that can be easily implemented in practice. Numerical studies suggest that the test performs well with respect to both empirical size and power even when the number of variables in a gene set is not small compared to the sample size.
越来越多的证据表明,基因组和蛋白质组研究在彻底改变医学实践方面具有巨大潜力。识别用于风险评估的重要基因组和生物标志物的能力,从疾病预防到检测再到治疗选择,都可能对公共卫生产生重大影响。然而,潜在的大量标志物以及标志物与感兴趣的结果之间关系的复杂性,给开发准确的风险预测模型带来了巨大挑战。识别重要标志物的标准方法通常评估单个标志物对感兴趣表型的边际效应。当多个标志物通过复杂结构同时与表型相关时,这种边际分析可能无效。为了克服这些困难,我们采用核机器Cox回归框架,并提出一种有效的得分检验,以评估一组标志物(如通路或网络中的基因)对生存结果的总体效应。所提出的检验具有无需明确指定特定非线性函数形式就能捕捉潜在非线性效应的优点。为了近似得分统计量的零分布,我们提出一种简单的重采样程序,该程序在实践中易于实现。数值研究表明,即使基因集中的变量数量与样本量相比不小,该检验在经验大小和检验功效方面都表现良好。