Gross Arnd, Tönjes Anke, Scholz Markus
Institute for Medical Informatics, Statistics and Epidemiology, University of Leipzig, Haertelstrasse 16-18, Leipzig, 04107, Germany.
LIFE - Leipzig Research Center for Civilization Diseases, University of Leipzig, Philipp-Rosenthal-Strasse 27, Leipzig, 04103, Germany.
BMC Genet. 2017 Dec 6;18(1):104. doi: 10.1186/s12863-017-0571-x.
When testing for SNP (single nucleotide polymorphism) associations in related individuals, observations are not independent. Simple linear regression assuming independent normally distributed residuals results in an increased type I error and the power of the test is also affected in a more complicate manner. Inflation of type I error is often successfully corrected by genomic control. However, this reduces the power of the test when relatedness is of concern. In the present paper, we derive explicit formulae to investigate how heritability and strength of relatedness contribute to variance inflation of the effect estimate of the linear model. Further, we study the consequences of variance inflation on hypothesis testing and compare the results with those of genomic control correction. We apply the developed theory to the publicly available HapMap trio data (N=129), the Sorbs (a self-contained population with N=977 characterised by a cryptic relatedness structure) and synthetic family studies with different sample sizes (ranging from N=129 to N=999) and different degrees of relatedness.
We derive explicit and easily to apply approximation formulae to estimate the impact of relatedness on the variance of the effect estimate of the linear regression model. Variance inflation increases with increasing heritability. Relatedness structure also impacts the degree of variance inflation as shown for example family structures. Variance inflation is smallest for HapMap trios, followed by a synthetic family study corresponding to the trio data but with larger sample size than HapMap. Next strongest inflation is observed for the Sorbs, and finally, for a synthetic family study with a more extreme relatedness structure but with similar sample size as the Sorbs. Type I error increases rapidly with increasing inflation. However, for smaller significance levels, power increases with increasing inflation while the opposite holds for larger significance levels. When genomic control is applied, type I error is preserved while power decreases rapidly with increasing variance inflation.
Stronger relatedness as well as higher heritability result in increased variance of the effect estimate of simple linear regression analysis. While type I error rates are generally inflated, the behaviour of power is more complex since power can be increased or reduced in dependence on relatedness and the heritability of the phenotype. Genomic control cannot be recommended to deal with inflation due to relatedness. Although it preserves type I error, the loss in power can be considerable. We provide a simple formula for estimating variance inflation given the relatedness structure and the heritability of a trait of interest. As a rule of thumb, variance inflation below 1.05 does not require correction and simple linear regression analysis is still appropriate.
在对相关个体进行单核苷酸多态性(SNP)关联测试时,观测值并非相互独立。假设残差独立且呈正态分布的简单线性回归会导致I型错误增加,并且检验效能也会以更复杂的方式受到影响。I型错误的膨胀通常可通过基因组控制成功校正。然而,当考虑亲缘关系时,这会降低检验效能。在本文中,我们推导出显式公式,以研究遗传力和亲缘关系强度如何导致线性模型效应估计值的方差膨胀。此外,我们研究方差膨胀对假设检验的影响,并将结果与基因组控制校正的结果进行比较。我们将所发展的理论应用于公开可用的HapMap三人组数据(N = 129)、索布人(一个具有神秘亲缘关系结构的独立群体,N = 977)以及不同样本量(从N = 129到N = 999)和不同亲缘关系程度的合成家系研究。
我们推导出了显式且易于应用的近似公式,以估计亲缘关系对线性回归模型效应估计值方差的影响。方差膨胀随遗传力的增加而增加。亲缘关系结构也会影响方差膨胀程度,例如家族结构所示。HapMap三人组的方差膨胀最小,其次是与三人组数据对应的合成家系研究,但样本量比HapMap大。索布人的方差膨胀次之,最后是具有更极端亲缘关系结构但样本量与索布人相似的合成家系研究。I型错误随方差膨胀的增加而迅速增加。然而,对于较小的显著性水平,效能随方差膨胀的增加而增加,而对于较大的显著性水平则相反。应用基因组控制时,I型错误得以保留,但效能会随着方差膨胀的增加而迅速降低。
更强的亲缘关系以及更高的遗传力会导致简单线性回归分析效应估计值的方差增加。虽然I型错误率通常会膨胀,但效能的行为更为复杂,因为效能可能会根据亲缘关系和表型的遗传力而增加或降低。不建议使用基因组控制来处理因亲缘关系导致的膨胀。尽管它保留了I型错误,但效能损失可能相当大。我们提供了一个简单的公式,用于在给定亲缘关系结构和感兴趣性状的遗传力的情况下估计方差膨胀。经验法则是,方差膨胀低于1.05时无需校正,简单线性回归分析仍然适用。