与连续观测相比，有序分类数据的基因组预测准确性降低。

Reduction in accuracy of genomic prediction for ordered categorical data compared to continuous observations.

作者信息

Kizilkaya Kadir, Fernando Rohan L, Garrick Dorian J

机构信息

Department of Animal Science, Iowa State University, Ames IA 50011, USA.

出版信息

Genet Sel Evol. 2014 Jun 9;46(1):37. doi: 10.1186/1297-9686-46-37.

DOI:10.1186/1297-9686-46-37

PMID:24912924

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4094927/

Abstract

BACKGROUND

Accuracy of genomic prediction depends on number of records in the training population, heritability, effective population size, genetic architecture, and relatedness of training and validation populations. Many traits have ordered categories including reproductive performance and susceptibility or resistance to disease. Categorical scores are often recorded because they are easier to obtain than continuous observations. Bayesian linear regression has been extended to the threshold model for genomic prediction. The objective of this study was to quantify reductions in accuracy for ordinal categorical traits relative to continuous traits.

METHODS

Efficiency of genomic prediction was evaluated for heritabilities of 0.10, 0.25 or 0.50. Phenotypes were simulated for 2250 purebred animals using 50 QTL selected from actual 50k SNP (single nucleotide polymorphism) genotypes giving a proportion of causal to total loci of.0001. A Bayes C π threshold model simultaneously fitted all 50k markers except those that represented QTL. Estimated SNP effects were utilized to predict genomic breeding values in purebred (n = 239) or multibreed (n = 924) validation populations. Correlations between true and predicted genomic merit in validation populations were used to assess predictive ability.

RESULTS

Accuracies of genomic estimated breeding values ranged from 0.12 to 0.66 for purebred and from 0.04 to 0.53 for multibreed validation populations based on Bayes C π linear model analysis of the simulated underlying variable. Accuracies for ordinal categorical scores analyzed by the Bayes C π threshold model were 20% to 50% lower and ranged from 0.04 to 0.55 for purebred and from 0.01 to 0.44 for multibreed validation populations. Analysis of ordinal categorical scores using a linear model resulted in further reductions in accuracy.

CONCLUSIONS

Threshold traits result in markedly lower accuracy than a linear model on the underlying variable. To achieve an accuracy equal or greater than for continuous phenotypes with a training population of 1000 animals, a 2.25 fold increase in training population size was required for categorical scores fitted with the threshold model. The threshold model resulted in higher accuracies than the linear model and its advantage was greatest when training populations were smallest.

摘要

背景

基因组预测的准确性取决于训练群体中的记录数量、遗传力、有效群体大小、遗传结构以及训练群体与验证群体的亲缘关系。许多性状具有有序类别，包括繁殖性能以及对疾病的易感性或抗性。通常记录分类得分，因为它们比连续观测值更容易获得。贝叶斯线性回归已扩展到用于基因组预测的阈值模型。本研究的目的是量化序数分类性状相对于连续性状准确性的降低程度。

方法

针对遗传力为0.10、0.25或0.50的情况评估基因组预测的效率。使用从实际50k单核苷酸多态性（SNP）基因型中选择的50个数量性状位点（QTL）为2250只纯种动物模拟表型，使得因果位点与总位点的比例为0.0001。贝叶斯Cπ阈值模型同时拟合除代表QTL的那些标记之外的所有50k个标记。利用估计的SNP效应在纯种（n = 239）或多品种（n = 924）验证群体中预测基因组育种值。验证群体中真实和预测的基因组优点之间的相关性用于评估预测能力。

结果

基于对模拟潜在变量的贝叶斯Cπ线性模型分析，纯种验证群体的基因组估计育种值准确性范围为0.12至0.66，多品种验证群体为0.04至0.53。通过贝叶斯Cπ阈值模型分析的序数分类得分的准确性低20%至50%，纯种验证群体范围为0.04至0.55，多品种验证群体为0.01至0.44。使用线性模型分析序数分类得分导致准确性进一步降低。

结论

阈值性状导致的准确性明显低于基于潜在变量的线性模型。为了在训练群体为1000只动物的情况下实现与连续表型相等或更高的准确性，对于采用阈值模型拟合的分类得分，训练群体大小需要增加2.25倍。阈值模型比线性模型产生更高的准确性，并且当训练群体最小时其优势最大。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

与连续观测相比，有序分类数据的基因组预测准确性降低。

Reduction in accuracy of genomic prediction for ordered categorical data compared to continuous observations.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

与连续观测相比，有序分类数据的基因组预测准确性降低。

Reduction in accuracy of genomic prediction for ordered categorical data compared to continuous observations.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献