Genus plc Hendersonville, TN, USA.
Department of Animal and Dairy Science, University of Georgia Athens, GA, USA.
Front Genet. 2014 May 20;5:134. doi: 10.3389/fgene.2014.00134. eCollection 2014.
The purpose of this study was to compare results obtained from various methodologies for genome-wide association studies, when applied to real data, in terms of number and commonality of regions identified and their genetic variance explained, computational speed, and possible pitfalls in interpretations of results. Methodologies include: two iteratively reweighted single-step genomic BLUP procedures (ssGWAS1 and ssGWAS2), a single-marker model (CGWAS), and BayesB. The ssGWAS methods utilize genomic breeding values (GEBVs) based on combined pedigree, genomic and phenotypic information, while CGWAS and BayesB only utilize phenotypes from genotyped animals or pseudo-phenotypes. In this study, ssGWAS was performed by converting GEBVs to SNP marker effects. Unequal variances for markers were incorporated for calculating weights into a new genomic relationship matrix. SNP weights were refined iteratively. The data was body weight at 6 weeks on 274,776 broiler chickens, of which 4553 were genotyped using a 60 k SNP chip. Comparison of genomic regions was based on genetic variances explained by local SNP regions (20 SNPs). After 3 iterations, the noise was greatly reduced for ssGWAS1 and results are similar to that of CGWAS, with 4 out of the top 10 regions in common. In contrast, for BayesB, the plot was dominated by a single region explaining 23.1% of the genetic variance. This same region was found by ssGWAS1 with the same rank, but the amount of genetic variation attributed to the region was only 3%. These findings emphasize the need for caution when comparing and interpreting results from various methods, and highlight that detected associations, and strength of association, strongly depends on methodologies and details of implementations. BayesB appears to overly shrink regions to zero, while overestimating the amount of genetic variation attributed to the remaining SNP effects. The real world is most likely a compromise between methods and remains to be determined.
本研究旨在比较不同全基因组关联研究方法在实际数据中的应用结果,包括鉴定到的区域数量和共性、遗传方差解释、计算速度以及结果解释中的潜在陷阱。方法包括:两种迭代重加权单步基因组 BLUP 程序(ssGWAS1 和 ssGWAS2)、单标记模型(CGWAS)和贝叶斯 B(BayesB)。ssGWAS 方法利用基于系谱、基因组和表型信息的组合基因组育种值(GEBVs),而 CGWAS 和 BayesB 仅利用已基因型动物的表型或伪表型。在本研究中,ssGWAS 通过将 GEBVs 转换为 SNP 标记效应来实现。为了计算权重,为新的基因组关系矩阵分配了不等的标记方差。 SNP 权重被迭代优化。数据是 274776 只肉鸡 6 周龄时的体重,其中 4553 只鸡使用 60 k SNP 芯片进行了基因型检测。基于局部 SNP 区域(20 个 SNP)解释的遗传方差比较基因组区域。经过 3 次迭代,ssGWAS1 大大降低了噪声,结果与 CGWAS 相似,前 10 个区域中有 4 个相同。相比之下,对于 BayesB,图谱主要由一个解释 23.1%遗传方差的单一区域主导。ssGWAS1 也检测到了相同的区域,并且具有相同的排名,但归因于该区域的遗传变异量仅为 3%。这些发现强调了在比较和解释来自不同方法的结果时需要谨慎,并强调了检测到的关联及其关联强度强烈依赖于方法和实施细节。BayesB 似乎过度将区域收缩为零,同时高估了归因于剩余 SNP 效应的遗传变异量。现实世界很可能是各种方法之间的妥协,这仍有待确定。