Shariati Mohammad M, Sørensen Peter, Janss Luc
Department of Molecular Biology and Genetics, Faculty of Science and Technology, Aarhus University, DK-8830 Tjele, Denmark.
BMC Proc. 2012 May 21;6 Suppl 2(Suppl 2):S12. doi: 10.1186/1753-6561-6-S2-S12.
In genomic models that assign an individual variance to each marker, the contribution of one marker to the posterior distribution of the marker variance is only one degree of freedom (df), which introduces many variance parameters with only little information per variance parameter. A better alternative could be to form clusters of markers with similar effects where markers in a cluster have a common variance. Therefore, the influence of each marker group of size p on the posterior distribution of the marker variances will be p df.
The simulated data from the 15th QTL-MAS workshop were analyzed such that SNP markers were ranked based on their effects and markers with similar estimated effects were grouped together. In step 1, all markers with minor allele frequency more than 0.01 were included in a SNP-BLUP prediction model. In step 2, markers were ranked based on their estimated variance on the trait in step 1 and each 150 markers were assigned to one group with a common variance. In further analyses, subsets of 1500 and 450 markers with largest effects in step 2 were kept in the prediction model.
Grouping markers outperformed SNP-BLUP model in terms of accuracy of predicted breeding values. However, the accuracies of predicted breeding values were lower than Bayesian methods with marker specific variances.
Grouping markers is less flexible than allowing each marker to have a specific marker variance but, by grouping, the power to estimate marker variances increases. A prior knowledge of the genetic architecture of the trait is necessary for clustering markers and appropriate prior parameterization.
在为每个标记分配个体方差的基因组模型中,一个标记对方差后验分布的贡献只有一个自由度(df),这就引入了许多方差参数,而每个方差参数所包含的信息很少。更好的选择可能是将具有相似效应的标记聚成簇,使得一个簇中的标记具有共同的方差。因此,每个大小为p的标记组对方差后验分布的影响将是p个自由度。
对第15届QTL-MAS研讨会的模拟数据进行分析,根据单核苷酸多态性(SNP)标记的效应进行排序,并将估计效应相似的标记归为一组。在步骤1中,将所有次要等位基因频率大于0.01的标记纳入SNP最佳线性无偏预测(SNP-BLUP)模型。在步骤2中,根据步骤1中标记对性状的估计方差进行排序,每150个标记分为一组,具有共同的方差。在进一步分析中,将步骤2中效应最大的1500个和450个标记子集保留在预测模型中。
在预测育种值的准确性方面,标记分组优于SNP-BLUP模型。然而,预测育种值的准确性低于具有标记特异性方差的贝叶斯方法。
标记分组不如允许每个标记具有特定的标记方差灵活,但通过分组,估计标记方差的能力增强。标记聚类和适当的先验参数化需要对性状的遗传结构有先验知识。