Minică Camelia C, Genovese Giulio, Hultman Christina M, Pool René, Vink Jacqueline M, Neale Michael C, Dolan Conor V, Neale Benjamin M
Department of Biological Psychology,Vrije Universiteit,Amsterdam,The Netherlands.
The Stanley Center for Psychiatric Research,Broad Institute of the Massachusetts Institute of Technology and Harvard,Cambridge,MA.
Twin Res Hum Genet. 2017 Apr;20(2):108-118. doi: 10.1017/thg.2017.7. Epub 2017 Feb 27.
Sequence-based association studies are at a critical inflexion point with the increasing availability of exome-sequencing data. A popular test of association is the sequence kernel association test (SKAT). Weights are embedded within SKAT to reflect the hypothesized contribution of the variants to the trait variance. Because the true weights are generally unknown, and so are subject to misspecification, we examined the efficiency of a data-driven weighting scheme. We propose the use of a set of theoretically defensible weighting schemes, of which, we assume, the one that gives the largest test statistic is likely to capture best the allele frequency-functional effect relationship. We show that the use of alternative weights obviates the need to impose arbitrary frequency thresholds. As both the score test and the likelihood ratio test (LRT) may be used in this context, and may differ in power, we characterize the behavior of both tests. The two tests have equal power, if the weights in the set included weights resembling the correct ones. However, if the weights are badly specified, the LRT shows superior power (due to its robustness to misspecification). With this data-driven weighting procedure the LRT detected significant signal in genes located in regions already confirmed as associated with schizophrenia - the PRRC2A (p = 1.020e-06) and the VARS2 (p = 2.383e-06) - in the Swedish schizophrenia case-control cohort of 11,040 individuals with exome-sequencing data. The score test is currently preferred for its computational efficiency and power. Indeed, assuming correct specification, in some circumstances, the score test is the most powerful test. However, LRT has the advantageous properties of being generally more robust and more powerful under weight misspecification. This is an important result given that, arguably, misspecified models are likely to be the rule rather than the exception in weighting-based approaches.
随着外显子组测序数据越来越容易获取,基于序列的关联研究正处于一个关键的转折点。一种常用的关联检验方法是序列核关联检验(SKAT)。SKAT中嵌入了权重,以反映变异对性状变异的假设贡献。由于真实权重通常是未知的,因此容易出现错误设定,我们研究了一种数据驱动加权方案的效率。我们提出使用一组理论上合理的加权方案,我们认为,给出最大检验统计量的那个方案可能最能捕捉等位基因频率-功能效应关系。我们表明,使用替代权重无需设定任意频率阈值。由于在此背景下既可以使用得分检验也可以使用似然比检验(LRT),且二者的功效可能不同,我们对这两种检验的行为进行了刻画。如果集合中的权重包含类似于正确权重的权重,则这两种检验具有相同的功效。然而,如果权重设定不当,LRT显示出更高的功效(由于其对错误设定的稳健性)。通过这种数据驱动的加权程序,在瑞典11,040名有外显子组测序数据的精神分裂症病例对照队列中,LRT在已被确认为与精神分裂症相关的区域中的基因——PRRC2A(p = 1.020e - 06)和VARS2(p = 2.383e - 06)中检测到了显著信号。目前,得分检验因其计算效率和功效而更受青睐。实际上,假设设定正确,在某些情况下,得分检验是最具功效的检验。然而,LRT具有通常更稳健且在权重错误设定下更具功效的优势特性。鉴于在基于加权的方法中,错误设定的模型很可能是常态而非例外,这是一个重要的结果。