Department of Molecular Biology and Genetics, Aarhus University, PO Box 50, Tjele, DK-8830, Denmark.
BMC Genet. 2012 Aug 14;13:71. doi: 10.1186/1471-2156-13-71.
There is often a pronounced disagreement between results obtained from different genome-wide association studies in cattle. There are multiple reasons for this disagreement. Particularly the presence of false positives leads to a need to validate detected QTL before they are optimally incorporated or weighted in selection decisions or further studied for causal gene. In dairy cattle progeny testing scheme new data is routinely accumulated which can be used to validate previously discovered associations. However, the data is not an independent sample and the sample size may not be sufficient to have enough power to validate previous discoveries. Here we compared two strategies to validate previously detected QTL when new data is added from the same study population. We compare analyzing a combined dataset (COMB) including all data presently available to only analyzing a validation dataset (VAL) i.e. a new dataset not previously analyzed as an independent replication. Secondly, we confirm SNP detected in the Reference population (REF) (i.e. previously analyzed dataset consists of older bulls) in the VAL dataset.
Clearly the results from the combined (COMB) dataset which had nearly twice the sample size of other two subsets allowed the detection of far more significant associations than the two smaller subsets. The number of significant SNPs in REF (older bulls) was about four times higher compare to VAL (younger bulls) though both had similar sample sizes, 2,219 and 2,039 respectively. A total of 424 SNP-trait combinations on 22 chromosomes showed genome-wide significant association involving 284 unique SNPs in the COMB dataset. In the REF data set 101 associations (73 unique SNPs) and in the VAL 24 associations (18 unique SNPs) were found genome-wide significant. Sixty-eight percent of the SNPs in the REF dataset could be confirmed in the VAL dataset. Out of 469 unique SNPs showing chromosome-wide significant association with calving traits in the REF dataset 321 could be confirmed in the VAL dataset at P < 0.05.
The follow-up study for GWAS in cattle will depend on the aim of the study. If the aim is to discover novel QTL, analyses of the COMB dataset is recommended, while in case of identification of the causal mutation underlying a QTL, confirmation of the discovered SNPs are necessary to avoid following a false positive.
在牛的全基因组关联研究中,常常存在结果之间明显的不一致。造成这种不一致的原因有很多。特别是假阳性的存在,导致需要在将检测到的 QTL 最佳纳入选择决策或进一步研究因果基因之前对其进行验证。在奶牛后代测试方案中,新数据通常会被累积,这些数据可用于验证以前发现的关联。然而,这些数据并不是独立的样本,样本量可能不足以有足够的能力验证以前的发现。在这里,我们比较了两种策略,即在同一研究群体中添加新数据时,验证先前发现的 QTL。我们比较了分析包含所有现有数据的综合数据集 (COMB) 与仅分析验证数据集 (VAL) 的情况,即以前未作为独立复制进行分析的新数据集。其次,我们确认了 VAL 数据集中在参考群体 (REF) 中检测到的 SNP(即以前分析的数据集由年龄较大的公牛组成)。
显然,综合数据集(COMB)的结果几乎是其他两个子集样本量的两倍,因此能够检测到比两个较小子集更显著的关联。与 VAL(年轻公牛)相比,REF(年龄较大的公牛)中检测到的显著 SNP 数量大约高出四倍,尽管它们的样本量相似,分别为 2219 个和 2039 个。在 COMB 数据集中,共有 22 条染色体上的 424 个 SNP-性状组合显示出全基因组显著关联,涉及 284 个独特的 SNP。在 REF 数据集中发现了 101 个全基因组显著关联(73 个独特 SNP),在 VAL 中发现了 24 个全基因组显著关联(18 个独特 SNP)。在 REF 数据集中,68%的 SNP 可以在 VAL 数据集中得到确认。在 REF 数据集中,与产犊性状呈全基因组显著关联的 469 个独特 SNP 中,有 321 个 SNP 在 VAL 数据集中可以在 P<0.05 水平下得到确认。
牛的全基因组关联研究的后续研究将取决于研究的目的。如果目的是发现新的 QTL,则建议分析 COMB 数据集,而如果目的是确定 QTL 下的因果突变,则需要确认发现的 SNP,以避免后续的假阳性。