通过 EthAcc 对基因组预测进行训练集优化。

Training set optimization of genomic prediction by means of EthAcc.

机构信息

LIPM, Université de Toulouse, INRA, CNRS, Castanet-Tolosan, France.

GDEC, INRA, Clermont-Ferrand, France.

出版信息

PLoS One. 2019 Feb 19;14(2):e0205629. doi: 10.1371/journal.pone.0205629. eCollection 2019.

DOI:10.1371/journal.pone.0205629

PMID:30779753

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6380617/

Abstract

Genomic prediction is a useful tool for plant and animal breeding programs and is starting to be used to predict human diseases as well. A shortcoming that slows down the genomic selection deployment is that the accuracy of the prediction is not known a priori. We propose EthAcc (Estimated THeoretical ACCuracy) as a method for estimating the accuracy given a training set that is genotyped and phenotyped. EthAcc is based on a causal quantitative trait loci model estimated by a genome-wide association study. This estimated causal model is crucial; therefore, we compared different methods to find the one yielding the best EthAcc. The multilocus mixed model was found to perform the best. We compared EthAcc to accuracy estimators that can be derived via a mixed marker model. We showed that EthAcc is the only approach to correctly estimate the accuracy. Moreover, in case of a structured population, in accordance with the achieved accuracy, EthAcc showed that the biggest training set is not always better than a smaller and closer training set. We then performed training set optimization with EthAcc and compared it to CDmean. EthAcc outperformed CDmean on real datasets from sugar beet, maize, and wheat. Nonetheless, its performance was mainly due to the use of an optimal but inaccessible set as a start of the optimization algorithm. EthAcc's precision and algorithm issues prevent it from reaching a good training set with a random start. Despite this drawback, we demonstrated that a substantial gain in accuracy can be obtained by performing training set optimization.

摘要

基因组预测是植物和动物育种计划的有用工具，并且开始被用于预测人类疾病。减缓基因组选择部署的一个缺点是，预测的准确性是未知的。我们提出 EthAcc（估计理论准确性）作为一种方法，用于根据已基因分型和表型的训练集来估计准确性。EthAcc 基于通过全基因组关联研究估计的因果数量性状位点模型。这个估计的因果模型至关重要；因此，我们比较了不同的方法来找到产生最佳 EthAcc 的方法。多基因混合模型被发现表现最好。我们将 EthAcc 与可以通过混合标记模型得出的准确性估计器进行了比较。我们表明，EthAcc 是正确估计准确性的唯一方法。此外，在结构种群的情况下，根据所达到的准确性，EthAcc 表明最大的训练集并不总是比更小且更接近的训练集更好。然后，我们使用 EthAcc 进行了训练集优化，并将其与 CDmean 进行了比较。EthAcc 在甜菜、玉米和小麦的真实数据集上的表现优于 CDmean。尽管如此，它的性能主要是由于使用了最优但无法访问的集合作为优化算法的起点。EthAcc 的精度和算法问题阻止了它从随机起点获得良好的训练集。尽管存在这一缺陷，但我们证明，通过执行训练集优化可以获得显著的准确性提高。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

通过 EthAcc 对基因组预测进行训练集优化。

Training set optimization of genomic prediction by means of EthAcc.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

通过 EthAcc 对基因组预测进行训练集优化。

Training set optimization of genomic prediction by means of EthAcc.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献