Suppr超能文献

TrG2P:一种基于迁移学习的工具,集成多性状数据,用于准确预测作物产量。

TrG2P: A transfer-learning-based tool integrating multi-trait data for accurate prediction of crop yield.

机构信息

Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China.

Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE 68583, USA; Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE 68583, USA.

出版信息

Plant Commun. 2024 Jul 8;5(7):100975. doi: 10.1016/j.xplc.2024.100975. Epub 2024 May 15.

Abstract

Yield prediction is the primary goal of genomic selection (GS)-assisted crop breeding. Because yield is a complex quantitative trait, making predictions from genotypic data is challenging. Transfer learning can produce an effective model for a target task by leveraging knowledge from a different, but related, source domain and is considered a great potential method for improving yield prediction by integrating multi-trait data. However, it has not previously been applied to genotype-to-phenotype prediction owing to the lack of an efficient implementation framework. We therefore developed TrG2P, a transfer-learning-based framework. TrG2P first employs convolutional neural networks (CNN) to train models using non-yield-trait phenotypic and genotypic data, thus obtaining pre-trained models. Subsequently, the convolutional layer parameters from these pre-trained models are transferred to the yield prediction task, and the fully connected layers are retrained, thus obtaining fine-tuned models. Finally, the convolutional layer and the first fully connected layer of the fine-tuned models are fused, and the last fully connected layer is trained to enhance prediction performance. We applied TrG2P to five sets of genotypic and phenotypic data from maize (Zea mays), rice (Oryza sativa), and wheat (Triticum aestivum) and compared its model precision to that of seven other popular GS tools: ridge regression best linear unbiased prediction (rrBLUP), random forest, support vector regression, light gradient boosting machine (LightGBM), CNN, DeepGS, and deep neural network for genomic prediction (DNNGP). TrG2P improved the accuracy of yield prediction by 39.9%, 6.8%, and 1.8% in rice, maize, and wheat, respectively, compared with predictions generated by the best-performing comparison model. Our work therefore demonstrates that transfer learning is an effective strategy for improving yield prediction by integrating information from non-yield-trait data. We attribute its enhanced prediction accuracy to the valuable information available from traits associated with yield and to training dataset augmentation. The Python implementation of TrG2P is available at https://github.com/lijinlong1991/TrG2P. The web-based tool is available at http://trg2p.ebreed.cn:81.

摘要

产量预测是基因组选择(GS)辅助作物育种的主要目标。由于产量是一个复杂的数量性状,因此从基因型数据进行预测具有挑战性。迁移学习可以通过利用来自不同但相关的源域的知识为目标任务生成有效模型,并且被认为是通过整合多性状数据来提高产量预测的一种很有潜力的方法。然而,由于缺乏有效的实施框架,它以前并未应用于基因型到表型的预测。因此,我们开发了基于迁移学习的 TrG2P 框架。TrG2P 首先使用卷积神经网络(CNN)使用非产量性状的表型和基因型数据来训练模型,从而获得预训练模型。随后,将这些预训练模型的卷积层参数转移到产量预测任务中,并重新训练全连接层,从而获得微调模型。最后,融合微调模型的卷积层和第一层全连接层,并训练最后一层全连接层以提高预测性能。我们将 TrG2P 应用于来自玉米(Zea mays)、水稻(Oryza sativa)和小麦(Triticum aestivum)的五组基因型和表型数据,并将其模型精度与其他七种流行的 GS 工具进行了比较:岭回归最佳线性无偏预测(rrBLUP)、随机森林、支持向量回归、轻梯度提升机(LightGBM)、CNN、DeepGS 和基因组预测的深度神经网络(DNNGP)。与最佳比较模型生成的预测相比,TrG2P 分别将水稻、玉米和小麦的产量预测精度提高了 39.9%、6.8%和 1.8%。因此,我们的工作表明,迁移学习是通过整合非产量性状数据信息来提高产量预测的有效策略。我们将其增强的预测精度归因于与产量相关的性状提供的有价值信息以及训练数据集的扩充。TrG2P 的 Python 实现可在 https://github.com/lijinlong1991/TrG2P 上获得。基于网络的工具可在 http://trg2p.ebreed.cn:81 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3e0/11287160/03be4924f2fe/gr1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验