Department of Biological Psychology, VU University Amsterdam, Van der Boechorststraat 1, Amsterdam, The Netherlands.
Behav Genet. 2013 May;43(3):254-66. doi: 10.1007/s10519-013-9590-1. Epub 2013 Mar 22.
When phenotypic, but no genotypic data are available for relatives of participants in genetic association studies, previous research has shown that family-based imputed genotypes can boost the statistical power when included in such studies. Here, using simulations, we compared the performance of two statistical approaches suitable to model imputed genotype data: the mixture approach, which involves the full distribution of the imputed genotypes and the dosage approach, where the mean of the conditional distribution features as the imputed genotype. Simulations were run by varying sibship size, size of the phenotypic correlations among siblings, imputation accuracy and minor allele frequency of the causal SNP. Furthermore, as imputing sibling data and extending the model to include sibships of size two or greater requires modeling the familial covariance matrix, we inquired whether model misspecification affects power. Finally, the results obtained via simulations were empirically verified in two datasets with continuous phenotype data (height) and with a dichotomous phenotype (smoking initiation). Across the settings considered, the mixture and the dosage approach are equally powerful and both produce unbiased parameter estimates. In addition, the likelihood-ratio test in the linear mixed model appears to be robust to the considered misspecification in the background covariance structure, given low to moderate phenotypic correlations among siblings. Empirical results show that the inclusion in association analysis of imputed sibling genotypes does not always result in larger test statistic. The actual test statistic may drop in value due to small effect sizes. That is, if the power benefit is small, that the change in distribution of the test statistic under the alternative is relatively small, the probability is greater of obtaining a smaller test statistic. As the genetic effects are typically hypothesized to be small, in practice, the decision on whether family-based imputation could be used as a means to increase power should be informed by prior power calculations and by the consideration of the background correlation.
当遗传关联研究的参与者的亲属只有表型数据而没有基因型数据时,先前的研究表明,包含在这些研究中的基于家庭的推断基因型可以提高统计效力。在这里,我们使用模拟比较了两种适合模拟推断基因型数据的统计方法的性能:混合方法,其中涉及推断基因型的完整分布,以及剂量方法,其中条件分布的均值作为推断基因型。通过改变同胞大小、同胞间表型相关性的大小、推断准确性和因果 SNP 的次要等位基因频率来运行模拟。此外,由于推断同胞数据并将模型扩展到包含大小为 2 或更大的同胞需要对家族协方差矩阵进行建模,我们询问了模型失拟是否会影响效力。最后,通过模拟获得的结果在两个具有连续表型数据(身高)和二分类表型(吸烟起始)的数据集上进行了经验验证。在所考虑的设置中,混合方法和剂量方法的效力相同,并且都产生无偏的参数估计。此外,线性混合模型中的似然比检验似乎对背景协方差结构的考虑失拟具有鲁棒性,前提是同胞间的表型相关性较低至中等。实证结果表明,将推断的同胞基因型纳入关联分析并不总是导致更大的检验统计量。由于效应大小较小,实际检验统计量可能会下降。也就是说,如果功效收益较小,替代方案下检验统计量的分布变化相对较小,则获得较小检验统计量的概率更大。由于遗传效应通常被假设为较小,因此在实践中,是否可以使用基于家庭的推断来增加功效的决定应该通过事先的功效计算和对背景相关性的考虑来告知。