Li Xiang, Basu Saonli, Miller Michael B, Iacono William G, McGue Matt
Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA.
Hum Hered. 2011;71(1):67-82. doi: 10.1159/000324839. Epub 2011 Apr 8.
Genome-wide association studies (GWAS) using family data involve association analyses between hundreds of thousands of markers and a trait for a large number of related individuals. The correlations among relatives bring statistical and computational challenges when performing these large-scale association analyses. Recently, several rapid methods accounting for both within- and between-family variation have been proposed. However, these techniques mostly model the phenotypic similarities in terms of genetic relatedness. The familial resemblances in many family-based studies such as twin studies are not only due to the genetic relatedness, but also derive from shared environmental effects and assortative mating. In this paper, we propose 2 generalized least squares (GLS) models for rapid association analysis of family-based GWAS, which accommodate both genetic and environmental contributions to familial resemblance. In our first model, we estimated the joint genetic and environmental variations. In our second model, we estimated the genetic and environmental components separately. Through simulation studies, we demonstrated that our proposed approaches are more powerful and computationally efficient than a number of existing methods are. We show that estimating the residual variance-covariance matrix in the GLS models without SNP effects does not lead to an appreciable bias in the p values as long as the SNP effect is small (i.e. accounting for no more than 1% of trait variance).
使用家族数据的全基因组关联研究(GWAS)涉及对数以十万计的标记与大量相关个体的某一性状之间进行关联分析。在进行这些大规模关联分析时,亲属之间的相关性带来了统计和计算方面的挑战。最近,已经提出了几种兼顾家族内部和家族之间变异的快速方法。然而,这些技术大多根据遗传相关性对表型相似性进行建模。在许多基于家族的研究(如同卵双胞胎研究)中,家族相似性不仅源于遗传相关性,还源于共同的环境影响和选型交配。在本文中,我们提出了两种广义最小二乘法(GLS)模型用于基于家族的GWAS快速关联分析,这两种模型兼顾了遗传和环境对家族相似性的影响。在我们的第一个模型中,我们估计了联合遗传和环境变异。在我们的第二个模型中,我们分别估计了遗传和环境成分。通过模拟研究,我们证明我们提出的方法比许多现有方法更强大且计算效率更高。我们表明,只要单核苷酸多态性(SNP)效应较小(即占性状变异不超过1%),在无SNP效应的GLS模型中估计残差方差 - 协方差矩阵不会导致p值出现明显偏差。