Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN 37232, USA.
Bioinformatics. 2013 Jan 1;29(1):99-105. doi: 10.1093/bioinformatics/bts643. Epub 2012 Nov 4.
Pathway or gene set analysis has been widely applied to genomic data. Many current pathway testing methods use univariate test statistics calculated from individual genomic markers, which ignores the correlations and interactions between candidate markers. Random forests-based pathway analysis is a promising approach for incorporating complex correlation and interaction patterns, but one limitation of previous approaches is that pathways have been considered separately, thus pathway cross-talk information was not considered.
In this article, we develop a new pathway hunting algorithm for survival outcomes using random survival forests, which prioritize important pathways by accounting for gene correlation and genomic interactions. We show that the proposed method performs favourably compared with five popular pathway testing methods using both synthetic and real data. We find that the proposed methodology provides an efficient and powerful pathway modelling framework for high-dimensional genomic data.
The R code for the analysis used in this article is available upon request.
途径或基因集分析已广泛应用于基因组数据。许多当前的途径测试方法使用从单个基因组标记计算的单变量测试统计信息,这忽略了候选标记之间的相关性和相互作用。基于随机森林的途径分析是一种很有前途的方法,可以结合复杂的相关性和相互作用模式,但以前方法的一个限制是,途径是分开考虑的,因此没有考虑途径串扰信息。
在本文中,我们使用随机生存森林为生存结果开发了一种新的途径搜索算法,该算法通过考虑基因相关性和基因组相互作用来优先考虑重要途径。我们表明,与使用合成和真实数据的五种流行的途径测试方法相比,所提出的方法表现良好。我们发现,所提出的方法为高维基因组数据提供了一种高效、强大的途径建模框架。
本文中使用的分析 R 代码可根据要求提供。