Suppr超能文献

普通菜豆(Phaseolus vulgaris L.)基因组预测的环境集成模型。

Environment ensemble models for genomic prediction in common bean (Phaseolus vulgaris L.).

作者信息

Chiaravallotti Isabella, Pauptit Owen, Hoyos-Villegas Valerio

机构信息

Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, Michigan, USA.

School of Informatics, University of Edinburgh, Edinburgh, Scotland, UK.

出版信息

Plant Genome. 2025 Jun;18(2):e70057. doi: 10.1002/tpg2.70057.

Abstract

For important food crops such as the common bean (Phaseolus vulgaris, L.), global demand continues to outpace the rate of genetic gain for quantitative traits. In this study, we leveraged the multi-environment trial (MET) dataset from the cooperative dry bean nursery (CDBN) to investigate the use of ensemble models for genomic prediction. This set spans 70 locations and 30 years, and accounts for over 150 phenotypes and hundreds of genotypes sequenced for 1.2 million single nucleotide polymorphism markers. We tested three models (linear regression, ridge regression, and neural networks). Each of the three models was implemented using three different approaches: (1) combining all data into one model (singular model), (2) all available single locations were used to train individual submodels comprising one ensemble model (ensemble model), and (3) optimized sets of single locations were used to train individual submodels comprising one ensemble model (optimized ensemble model). The optimized ensemble approach worked best for low-variance locations because the model variance was reduced by averaging across submodels in the ensemble. For models with low prediction accuracy, the ensemble approach can increase accuracy. In certain locations, prediction accuracy was able to overcome narrow-sense heritability, indicating that genomic selection is more efficient than phenotypic selection in these locations. This study indicates that breeding program collaboration can be a way to bypass the bottleneck of low data volume, as pooled data from the CDBN MET produced prediction accuracies of 0.70 for days to flowering, 0.54 for days to maturity, 0.95 for seed weight, and 0.67 for seed yield in individual locations.

摘要

对于像普通菜豆(Phaseolus vulgaris, L.)这样重要的粮食作物,全球对其需求持续超过数量性状的遗传增益速度。在本研究中,我们利用合作干豆苗圃(CDBN)的多环境试验(MET)数据集来研究集成模型在基因组预测中的应用。该数据集涵盖70个地点和30年的数据,包含150多种表型以及针对120万个单核苷酸多态性标记测序的数百个基因型。我们测试了三种模型(线性回归、岭回归和神经网络)。这三种模型分别采用三种不同方法实现:(1)将所有数据合并到一个模型中(单一模型);(2)使用所有可用的单个地点来训练构成一个集成模型的各个子模型(集成模型);(3)使用经过优化的单个地点集来训练构成一个集成模型的各个子模型(优化集成模型)。优化集成方法在低方差地点效果最佳,因为通过对集成中的子模型求平均可降低模型方差。对于预测准确性较低的模型,集成方法可以提高准确性。在某些地点,预测准确性能够超过狭义遗传力,这表明在这些地点基因组选择比表型选择更有效。本研究表明,育种计划合作可以作为一种绕过数据量少这一瓶颈的方法,因为CDBN MET的汇总数据在各个地点对开花天数的预测准确性为0.70,对成熟天数的预测准确性为0.54,对种子重量的预测准确性为0.95,对种子产量的预测准确性为0.67。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验