Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark.
Genet Sel Evol. 2019 May 10;51(1):20. doi: 10.1186/s12711-019-0463-9.
BACKGROUND: Genome-wide association studies (GWAS) are widely used to identify regions of the genome that harbor genetic determinants of quantitative traits. However, the multiple-testing burden from scanning tens of millions of whole-genome sequence variants reduces the power to identify associated variants, especially if sample size is limited. In addition, factors such as inaccuracy of imputation, complex linkage disequilibrium structures, and multiple closely-located causal variants may result in an identified causative mutation not being the most significant single nucleotide polymorphism in a particular genomic region. Therefore, the use of information from different sources, particularly variant annotations, was proposed to enhance the fine-mapping of causal variants. Here, we tested whether applying significance thresholds based on variant annotation categories increases the power of GWAS compared with a flat Bonferroni multiple-testing correction. RESULTS: Whole-genome sequence variants in dairy cattle were categorized according to type and predicted impact. Then, GWAS between markers and 17 quantitative traits were analyzed for enrichment for association of each annotation category. By using annotation categories that were determined with the variants effect predictor software and datasets indicating regions of open chromatin, "low impact" variants were found to be highly enriched. Moreover, when the variants annotated as "modifier" and not located at open chromatin regions were further classified into different types of potential regulatory elements, the high impact variants, moderate impact variants, variants located in the 3' and 5' untranslated regions, and variants located in potential non-coding RNA regions exhibited relatively more enrichment. In contrast, a similar study on human GWAS data reported that enrichment of association signals was highest with high impact variants. We observed an increase in power when these variant category-based significance thresholds were applied for GWAS results on stature in Nordic Holstein cattle, as more candidate genes from previous large GWAS meta-analysis for cattle stature were confirmed. CONCLUSIONS: Use of variant category-based genome-wide significance thresholds can marginally increase the power to detect the candidate genes in cattle. With the continued improvements in annotation of the bovine genome, we anticipate that the growing usefulness of variant category-based significance thresholds will be demonstrated.
背景:全基因组关联研究(GWAS)被广泛用于鉴定基因组中含有数量性状遗传决定因素的区域。然而,扫描数千万个全基因组序列变体的多重测试负担会降低识别相关变体的能力,特别是如果样本量有限的话。此外,诸如不准确的 imputation、复杂的连锁不平衡结构以及多个紧密定位的因果变体等因素可能导致鉴定出的致病突变不是特定基因组区域中最显著的单核苷酸多态性。因此,使用来自不同来源的信息,特别是变体注释,被提议用于增强因果变体的精细映射。在这里,我们测试了基于变体注释类别应用显著性阈值是否会增加 GWAS 的功效,与平坦的 Bonferroni 多重测试校正相比。
结果:根据类型和预测影响对奶牛全基因组序列变体进行分类。然后,分析了标记与 17 个数量性状之间的 GWAS,以富集每个注释类别的关联。通过使用由变体效应预测软件确定的注释类别和指示开放染色质区域的数据集,发现“低影响”变体高度富集。此外,当被注释为“调节剂”且不位于开放染色质区域的变体进一步分为不同类型的潜在调控元件时,高影响变体、中影响变体、位于 3'和 5'非翻译区的变体以及位于潜在非编码 RNA 区域的变体表现出相对更多的富集。相比之下,对人类 GWAS 数据的类似研究报告称,高影响变体的关联信号富集度最高。当在北欧荷斯坦牛的体型 GWAS 结果中应用这些基于变体类别显著性阈值时,我们观察到功效增加,因为之前牛体型的大型 GWAS 荟萃分析中更多的候选基因得到了证实。
结论:使用基于变体类别的全基因组显著性阈值可以略微增加检测牛候选基因的功效。随着牛基因组注释的不断改进,我们预计基于变体类别的显著性阈值的使用将变得越来越有用。
Animals (Basel). 2023-5-9
Nature. 2017-2-9
Genome Biol. 2016-6-6