Suppr超能文献

基因组预测中训练集优化的样本量确定。

Sample size determination for training set optimization in genomic prediction.

机构信息

Department of Agronomy, National Taiwan University, Taipei, Taiwan.

Institute for Quantitative Genetics and Genomics of Plants, Heinrich Heine University, Düsseldorf, Germany.

出版信息

Theor Appl Genet. 2023 Mar 13;136(3):57. doi: 10.1007/s00122-023-04254-9.

Abstract

A practical approach is developed to determine a cost-effective optimal training set for selective phenotyping in a genomic prediction study. An R function is provided to facilitate the application of the approach. Genomic prediction (GP) is a statistical method used to select quantitative traits in animal or plant breeding. For this purpose, a statistical prediction model is first built that uses phenotypic and genotypic data in a training set. The trained model is then used to predict genomic estimated breeding values (GEBVs) for individuals within a breeding population. Setting the sample size of the training set usually takes into account time and space constraints that are inevitable in an agricultural experiment. However, the determination of the sample size remains an unresolved issue for a GP study. By applying the logistic growth curve to identify prediction accuracy for the GEBVs and the training set size, a practical approach was developed to determine a cost-effective optimal training set for a given genome dataset with known genotypic data. Three real genome datasets were used to illustrate the proposed approach. An R function is provided to facilitate widespread application of this approach to sample size determination, which can help breeders to identify a set of genotypes with an economical sample size for selective phenotyping.

摘要

本文提出了一种实用的方法,用于确定基因组预测研究中选择性表型分析的具有成本效益的最优训练集。提供了一个 R 函数来方便应用该方法。基因组预测(GP)是一种用于在动物或植物育种中选择数量性状的统计方法。为此,首先构建一个统计预测模型,该模型使用训练集中的表型和基因型数据。然后,使用训练好的模型来预测育种群体中个体的基因组估计育种值(GEBV)。设置训练集的样本量通常需要考虑农业实验中不可避免的时间和空间限制。然而,对于 GP 研究来说,确定样本量仍然是一个未解决的问题。通过应用逻辑增长曲线来确定 GEBV 和训练集大小的预测准确性,本文提出了一种实用的方法,用于确定具有已知基因型数据的给定基因组数据集的具有成本效益的最优训练集。使用三个真实的基因组数据集来说明所提出的方法。提供了一个 R 函数来方便该方法在样本量确定方面的广泛应用,这可以帮助育种者确定一组具有经济样本量的基因型,用于选择性表型分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0969/10011335/b813079ee169/122_2023_4254_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验