Virginia Commonwealth University, Richmond, Virginia 23219, USA.
Genet Epidemiol. 2012 May;36(4):333-9. doi: 10.1002/gepi.21625. Epub 2012 Apr 16.
Univariate analysis of markers has modest power when there are multiple causal variants within a gene. Under this scenario, combining the effects of all variants from a gene in a gene-wide statistic is thought to increase power. However, it is not really clear (1) what is the performance of most commonly used gene-wide methods for whole genome scans and (2) how scalable these methods are for more computationally intensive analyses, e.g. analysis of genome-wide sequence data. We attempt to answer these questions by using realistic simulations to assess the performance of a range of gene-based methods: (1) commonly used, e.g. VEGAS and GATES; (2) less commonly used, e.g. Simes, adaptive sum (aSUM), and kernel methods; and (3) a combination of univariate and multivariate tests we proposed for the analysis of markers in linkage disequilibrium. Simes is the fastest method and has good power for single causal variant models. aSUM method has good power for multiple causal variant models, especially at lower gene lengths. Our proposed statistic yields good power for all causal models. Given the extreme data volumes coming from sequencing studies, we recommend a two step analysis of genome scans. The initial step uses the very fast Simes procedure to flag possibly interesting genes. The second step refines interesting signals by using more computationally intensive methods, e.g. (1) aSUM for shorter and (2) VEGAS for larger gene lengths. Alternatively, genome scans can be analyzed using only our proposed method while sacrificing only a modest amount of power.
当一个基因内存在多个因果变异时,对标记物进行单变量分析的功效有限。在这种情况下,将一个基因中所有变异的效应结合在一个基因范围内的统计中,被认为可以提高功效。然而,目前还不清楚(1)最常用于全基因组扫描的基因范围方法的性能如何,以及(2)这些方法对于更计算密集型的分析(例如,全基因组序列数据的分析)的可扩展性如何。我们试图通过使用现实的模拟来回答这些问题,以评估一系列基于基因的方法的性能:(1)常用的方法,例如 VEGAS 和 GATES;(2)不太常用的方法,例如 Simes、自适应总和 (aSUM) 和核方法;以及(3)我们提出的用于分析连锁不平衡标记物的单变量和多变量检验的组合。Simes 是最快的方法,对于单因果变异模型具有良好的功效。aSUM 方法对于多因果变异模型具有良好的功效,特别是在基因长度较短的情况下。我们提出的统计量对于所有因果模型都具有良好的功效。考虑到测序研究带来的极端数据量,我们建议对全基因组扫描进行两步分析。第一步使用非常快速的 Simes 程序来标记可能感兴趣的基因。第二步通过使用更计算密集的方法(例如,1)aSUM 用于较短的基因,和 2)VEGAS 用于较大的基因长度,来细化感兴趣的信号。或者,可以使用我们提出的方法仅分析基因组扫描,而仅牺牲适度的功效。