Institute for Genomic Diversity, Cornell University, Ithaca, New York, USA.
Nat Genet. 2010 Apr;42(4):355-60. doi: 10.1038/ng.546. Epub 2010 Mar 7.
Mixed linear model (MLM) methods have proven useful in controlling for population structure and relatedness within genome-wide association studies. However, MLM-based methods can be computationally challenging for large datasets. We report a compression approach, called 'compressed MLM', that decreases the effective sample size of such datasets by clustering individuals into groups. We also present a complementary approach, 'population parameters previously determined' (P3D), that eliminates the need to re-compute variance components. We applied these two methods both independently and combined in selected genetic association datasets from human, dog and maize. The joint implementation of these two methods markedly reduced computing time and either maintained or improved statistical power. We used simulations to demonstrate the usefulness in controlling for substructure in genetic association datasets for a range of species and genetic architectures. We have made these methods available within an implementation of the software program TASSEL.
混合线性模型(MLM)方法已被证明在控制全基因组关联研究中的群体结构和相关性方面非常有用。然而,基于 MLM 的方法对于大型数据集来说可能具有计算挑战性。我们报告了一种压缩方法,称为“压缩 MLM”,该方法通过将个体聚类成组来减少此类数据集的有效样本量。我们还提出了一种补充方法,称为“先前确定的群体参数”(P3D),该方法消除了重新计算方差分量的需要。我们在来自人类、狗和玉米的选定遗传关联数据集上独立和组合应用了这两种方法。这两种方法的联合实施显著减少了计算时间,并且保持或提高了统计能力。我们使用模拟来演示在控制各种物种和遗传结构的遗传关联数据集中亚结构的有用性。我们已经在 TASSEL 软件程序的实现中提供了这些方法。