Zawistowski Matthew, Reppell Mark, Wegmann Daniel, St Jean Pamela L, Ehm Margaret G, Nelson Matthew R, Novembre John, Zöllner Sebastian
Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA.
Department of Biology, University of Fribourg, Fribourg, Switzerland.
Eur J Hum Genet. 2014 Sep;22(9):1137-44. doi: 10.1038/ejhg.2013.297. Epub 2014 Jan 8.
There is substantial interest in the role of rare genetic variants in the etiology of complex human diseases. Several gene-based tests have been developed to simultaneously analyze multiple rare variants for association with phenotypic traits. The tests can largely be partitioned into two classes - 'burden' tests and 'joint' tests - based on how they accumulate evidence of association across sites. We used the empirical joint site frequency spectra of rare, nonsynonymous variation from a large multi-population sequencing study to explore the effect of realistic rare variant population structure on gene-based tests. We observed an important difference between the two test classes: their susceptibility to population stratification. Focusing on European samples, we found that joint tests, which allow variants to have opposite directions of effect, consistently showed higher levels of P-value inflation than burden tests. We determined that the differential stratification was caused by two specific patterns in the interpopulation distribution of rare variants, each correlating with inflation in one of the test classes. The pattern that inflates joint tests is more prevalent in real data, explaining the higher levels of inflation in these tests. Furthermore, we show that the different sources of inflation between tests lead to heterogeneous responses to genomic control correction and the number of variants analyzed. Our results indicate that care must be taken when interpreting joint and burden analyses of the same set of rare variants, in particular, to avoid mistaking inflated P-values in joint tests for stronger signals of true associations.
人们对罕见基因变异在复杂人类疾病病因学中的作用有着浓厚兴趣。已经开发了几种基于基因的测试,用于同时分析多个罕见变异与表型特征的关联。根据它们在各个位点积累关联证据的方式,这些测试大致可分为两类——“负担”测试和“联合”测试。我们利用来自一项大型多群体测序研究的罕见非同义变异的经验性联合位点频率谱,来探究现实的罕见变异群体结构对基于基因的测试的影响。我们观察到这两类测试之间存在一个重要差异:它们对群体分层的敏感性。聚焦于欧洲样本,我们发现允许变异具有相反效应方向的联合测试,始终显示出比负担测试更高水平的P值膨胀。我们确定这种差异分层是由罕见变异的群体间分布中的两种特定模式引起的,每种模式与其中一类测试中的膨胀相关。使联合测试膨胀的模式在实际数据中更为普遍,这解释了这些测试中更高水平的膨胀。此外,我们表明测试之间不同的膨胀来源导致对基因组控制校正和所分析变异数量的异质性反应。我们的结果表明,在解释同一组罕见变异的联合分析和负担分析时必须谨慎,特别是要避免将联合测试中膨胀的P值误认为是真实关联的更强信号。