Blavatnik School of Computer Science, Tel Aviv University, 6997801 Israel
Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115.
Genetics. 2017 Dec;207(4):1275-1283. doi: 10.1534/genetics.117.300395. Epub 2017 Oct 12.
Testing for the existence of variance components in linear mixed models is a fundamental task in many applicative fields. In statistical genetics, the score test has recently become instrumental in the task of testing an association between a set of genetic markers and a phenotype. With few markers, this amounts to set-based variance component tests, which attempt to increase power in association studies by aggregating weak individual effects. When the entire genome is considered, it allows testing for the heritability of a phenotype, defined as the proportion of phenotypic variance explained by genetics. In the popular score-based Sequence Kernel Association Test (SKAT) method, the assumed distribution of the score test statistic is uncalibrated in small samples, with a correction being computationally expensive. This may cause severe inflation or deflation of -values, even when the null hypothesis is true. Here, we characterize the conditions under which this discrepancy holds, and show it may occur also in large real datasets, such as a dataset from the Wellcome Trust Case Control Consortium 2 ( = 13,950) study, and, in particular, when the individuals in the sample are unrelated. In these cases, the SKAT approximation tends to be highly overconservative and therefore underpowered. To address this limitation, we suggest an efficient method to calculate exact -values for the score test in the case of a single variance component and a continuous response vector, which can speed up the analysis by orders of magnitude. Our results enable fast and accurate application of the score test in heritability and in set-based association tests. Our method is available in http://github.com/cozygene/RL-SKAT.
在许多应用领域中,检测线性混合模型中方差分量的存在是一项基本任务。在统计遗传学中,最近得分检验已成为检测一组遗传标记与表型之间关联的有力工具。在标记较少的情况下,这相当于基于集合的方差分量检验,它试图通过聚合弱个体效应来增加关联研究的功效。当考虑整个基因组时,它可以测试表型的遗传性,定义为遗传解释的表型方差比例。在流行的基于得分的序列核关联检验(SKAT)方法中,在小样本中,得分检验统计量的假设分布未经校准,而校正计算代价高昂。这可能导致负值严重膨胀或收缩,即使零假设为真。在这里,我们描述了这种差异存在的条件,并表明即使在大型真实数据集(例如来自 Wellcome Trust Case Control Consortium 2(n=13950)研究的数据集)中,也可能出现这种情况,特别是在样本中的个体无关的情况下。在这些情况下,SKAT 近似往往高度保守,因此功效不足。为了解决这个限制,我们建议了一种在单个方差分量和连续响应向量的情况下计算得分检验确切负值的有效方法,这可以通过数量级加速分析。我们的结果可以快速准确地应用于遗传力和基于集合的关联检验中的得分检验。我们的方法可在 http://github.com/cozygene/RL-SKAT 上获得。