Suppr超能文献

使用不同交叉验证布局对奶牛群体进行基因组预测的准确性

Accuracy of Genome-Enabled Prediction in a Dairy Cattle Population using Different Cross-Validation Layouts.

作者信息

Pérez-Cabal M Angeles, Vazquez Ana I, Gianola Daniel, Rosa Guilherme J M, Weigel Kent A

机构信息

Department of Animal Production, Complutense University of Madrid Madrid, Spain.

出版信息

Front Genet. 2012 Feb 28;3:27. doi: 10.3389/fgene.2012.00027. eCollection 2012.

Abstract

The impact of extent of genetic relatedness on accuracy of genome-enabled predictions was assessed using a dairy cattle population and alternative cross-validation (CV) strategies were compared. The CV layouts consisted of training and testing sets obtained from either random allocation of individuals (RAN) or from a kernel-based clustering of individuals using the additive relationship matrix, to obtain two subsets that were as unrelated as possible (UNREL), as well as a layout based on stratification by generation (GEN). The UNREL layout decreased the average genetic relationships between training and testing animals but produced similar accuracies to the RAN design, which were about 15% higher than in the GEN setting. Results indicate that the CV structure can have an important effect on the accuracy of whole-genome predictions. However, the connection between average genetic relationships across training and testing sets and the estimated predictive ability is not straightforward, and may depend also on the kind of relatedness that exists between the two subsets and on the heritability of the trait. For high heritability traits, close relatives such as parents and full-sibs make the greatest contributions to accuracy, which can be compensated by half-sibs or grandsires in the case of lack of close relatives. However, for the low heritability traits the inclusion of close relatives is crucial and including more relatives of various types in the training set tends to lead to greater accuracy. In practice, CV designs should resemble the intended use of the predictive models, e.g., within or between family predictions, or within or across generation predictions, such that estimation of predictive ability is consistent with the actual application to be considered.

摘要

利用奶牛群体评估了遗传相关性程度对基因组预测准确性的影响,并比较了不同的交叉验证(CV)策略。CV布局包括通过随机分配个体(RAN)或使用加性亲缘关系矩阵对个体进行基于核的聚类获得的训练集和测试集,以得到两个尽可能不相关的子集(UNREL),以及基于世代分层的布局(GEN)。UNREL布局降低了训练动物和测试动物之间的平均遗传关系,但产生的准确性与RAN设计相似,比GEN设置高出约15%。结果表明,CV结构对全基因组预测的准确性可能有重要影响。然而,训练集和测试集之间的平均遗传关系与估计的预测能力之间的联系并不直接,可能还取决于两个子集之间存在的相关性类型以及性状的遗传力。对于高遗传力性状,如父母和全同胞等近亲对准确性的贡献最大,在缺乏近亲的情况下,半同胞或祖父可以起到补偿作用。然而,对于低遗传力性状,纳入近亲至关重要,并且在训练集中纳入更多不同类型的亲属往往会提高准确性。在实际应用中,CV设计应类似于预测模型的预期用途,例如在家族内或家族间预测,或在世代内或世代间预测,以便预测能力的估计与要考虑的实际应用一致。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2997/3288819/48d458d9db94/fgene-03-00027-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验