Chen Angela H, Lipka Alexander E
Department of Statistics, University of Illinois at Urbana-Champaign, Illinois 61801.
Department of Crop Sciences, University of Illinois at Urbana-Champaign, Illinois 61801
G3 (Bethesda). 2016 Aug 9;6(8):2365-74. doi: 10.1534/g3.116.029090.
A typical plant genome-wide association study (GWAS) uses a mixed linear model (MLM) that includes a trait as the response variable, a marker as an explanatory variable, and fixed and random effect covariates accounting for population structure and relatedness. Although effective in controlling for false positive signals, this model typically fails to detect signals that are correlated with population structure or are located in high linkage disequilibrium (LD) genomic regions. This result likely arises from each tested marker being used to estimate population structure and relatedness. Previous work has demonstrated that it is possible to increase the power of the MLM by estimating relatedness (i.e., kinship) with markers that are not located on the chromosome where the tested marker resides. To quantify the amount of additional significant signals one can expect using this so-called K_chr model, we reanalyzed Mendelian, polygenic, and complex traits in two maize (Zea mays L.) diversity panels that have been previously assessed using the traditional MLM. We demonstrated that the K_chr model could find more significant associations, especially in high LD regions. This finding is underscored by our identification of novel genomic signals proximal to the tocochromanol biosynthetic pathway gene ZmVTE1 that are associated with a ratio of tocotrienols. We conclude that the K_chr model can detect more intricate sources of allelic variation underlying agronomically important traits, and should therefore become more widely used for GWAS. To facilitate the implementation of the K_chr model, we provide code written in the R programming language.
典型的全基因组关联研究(GWAS)采用混合线性模型(MLM),该模型将性状作为响应变量,标记作为解释变量,并使用固定和随机效应协变量来解释群体结构和相关性。尽管该模型在控制假阳性信号方面很有效,但通常无法检测到与群体结构相关或位于高连锁不平衡(LD)基因组区域的信号。这一结果可能是由于每个测试标记都用于估计群体结构和相关性。先前的研究表明,通过使用不在测试标记所在染色体上的标记来估计相关性(即亲缘关系),可以提高MLM的功效。为了量化使用这种所谓的K_chr模型可以预期获得的额外显著信号的数量,我们重新分析了两个玉米(Zea mays L.)多样性群体中的孟德尔性状、多基因性状和复杂性状,这两个群体先前已使用传统的MLM进行了评估。我们证明,K_chr模型可以发现更多显著的关联,尤其是在高LD区域。我们鉴定出与生育三烯酚比例相关的生育酚生物合成途径基因ZmVTE1附近的新基因组信号,这突出了这一发现。我们得出结论,K_chr模型可以检测到农艺重要性状潜在的更复杂的等位基因变异来源,因此应该更广泛地用于GWAS。为了便于实施K_chr模型,我们提供了用R编程语言编写的代码。