Bacanu Silviu-Alin, Nelson Matthew R, Whittaker John C
GlaxoSmithKline, Research Triangle Park, North Carolina, USA.
Genet Epidemiol. 2011 May;35(4):226-35. doi: 10.1002/gepi.20570.
Genome-wide association studies succeeded in finding genetic variants associated with various phenotypes, but a large portion of the predicted genetic contribution to many traits remains unknown. One plausible explanation is that some missing variation is due to rare variants. Latest sequencing technology facilitates the investigation of such rare variants, but their statistical analysis remains challenging. For quantitative traits, a commonly used approach is to contrast the frequency of putatively functional rare variants between subjects in the two tails of the trait distribution. The contrast is usually performed by Fisher's exact or similar test. These tests are conservative as they discard trait rank information and are most useful under the unrealistic homogeneity assumption (i.e., variants have similar effects). We propose, and investigate via simulations, various designs for resequencing studies and statistical methods that incorporate information about rank, predicted function and allow for heterogeneity of effects. We propose designs which accommodate heterogeneity by sequencing both tails and the middle of the trait and novel statistical tests for trend, for heterogeneity and for a combination of the two. The conclusions of the simulations are four fold: (1) sequencing both tails and the middle of the trait distributions is desirable when heterogeneity is suspected, (2) trend and heterogeneity statistics should be used alongside other methods, (3) using rank information improves power over Fisher's exact test when the number of rare variants is not very large and (4) due to high misclassification rates, incorporating current predictions of a variant's function does not improve power.
全基因组关联研究成功地找到了与各种表型相关的基因变异,但许多性状的预测遗传贡献中仍有很大一部分未知。一个合理的解释是,一些缺失的变异是由于罕见变异。最新的测序技术有助于对这类罕见变异进行研究,但其统计分析仍然具有挑战性。对于数量性状,一种常用的方法是对比性状分布两端的受试者中假定具有功能的罕见变异的频率。这种对比通常通过费舍尔精确检验或类似检验来进行。这些检验较为保守,因为它们丢弃了性状排名信息,并且在不切实际的同质性假设(即变异具有相似效应)下最为有用。我们提出并通过模拟研究了各种重测序研究设计和统计方法,这些方法纳入了排名信息、预测功能,并考虑了效应的异质性。我们提出了通过对性状的两端和中间部分进行测序来适应异质性的设计,以及用于趋势、异质性和两者结合的新颖统计检验。模拟的结论有四点:(1)当怀疑存在异质性时,对性状分布的两端和中间部分进行测序是可取的;(2)趋势和异质性统计应与其他方法一起使用;(3)当罕见变异数量不是非常大时,使用排名信息比费舍尔精确检验更具功效;(4)由于错误分类率较高,纳入变异功能的当前预测并不能提高功效。