TrG2P：一种基于迁移学习的工具，集成多性状数据，用于准确预测作物产量。

TrG2P: A transfer-learning-based tool integrating multi-trait data for accurate prediction of crop yield.

机构信息

Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China.

Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE 68583, USA; Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE 68583, USA.

出版信息

Plant Commun. 2024 Jul 8;5(7):100975. doi: 10.1016/j.xplc.2024.100975. Epub 2024 May 15.

DOI:10.1016/j.xplc.2024.100975

PMID:38751121

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11287160/

Abstract

Yield prediction is the primary goal of genomic selection (GS)-assisted crop breeding. Because yield is a complex quantitative trait, making predictions from genotypic data is challenging. Transfer learning can produce an effective model for a target task by leveraging knowledge from a different, but related, source domain and is considered a great potential method for improving yield prediction by integrating multi-trait data. However, it has not previously been applied to genotype-to-phenotype prediction owing to the lack of an efficient implementation framework. We therefore developed TrG2P, a transfer-learning-based framework. TrG2P first employs convolutional neural networks (CNN) to train models using non-yield-trait phenotypic and genotypic data, thus obtaining pre-trained models. Subsequently, the convolutional layer parameters from these pre-trained models are transferred to the yield prediction task, and the fully connected layers are retrained, thus obtaining fine-tuned models. Finally, the convolutional layer and the first fully connected layer of the fine-tuned models are fused, and the last fully connected layer is trained to enhance prediction performance. We applied TrG2P to five sets of genotypic and phenotypic data from maize (Zea mays), rice (Oryza sativa), and wheat (Triticum aestivum) and compared its model precision to that of seven other popular GS tools: ridge regression best linear unbiased prediction (rrBLUP), random forest, support vector regression, light gradient boosting machine (LightGBM), CNN, DeepGS, and deep neural network for genomic prediction (DNNGP). TrG2P improved the accuracy of yield prediction by 39.9%, 6.8%, and 1.8% in rice, maize, and wheat, respectively, compared with predictions generated by the best-performing comparison model. Our work therefore demonstrates that transfer learning is an effective strategy for improving yield prediction by integrating information from non-yield-trait data. We attribute its enhanced prediction accuracy to the valuable information available from traits associated with yield and to training dataset augmentation. The Python implementation of TrG2P is available at https://github.com/lijinlong1991/TrG2P. The web-based tool is available at http://trg2p.ebreed.cn:81.

摘要

产量预测是基因组选择（GS）辅助作物育种的主要目标。由于产量是一个复杂的数量性状，因此从基因型数据进行预测具有挑战性。迁移学习可以通过利用来自不同但相关的源域的知识为目标任务生成有效模型，并且被认为是通过整合多性状数据来提高产量预测的一种很有潜力的方法。然而，由于缺乏有效的实施框架，它以前并未应用于基因型到表型的预测。因此，我们开发了基于迁移学习的 TrG2P 框架。TrG2P 首先使用卷积神经网络（CNN）使用非产量性状的表型和基因型数据来训练模型，从而获得预训练模型。随后，将这些预训练模型的卷积层参数转移到产量预测任务中，并重新训练全连接层，从而获得微调模型。最后，融合微调模型的卷积层和第一层全连接层，并训练最后一层全连接层以提高预测性能。我们将 TrG2P 应用于来自玉米（Zea mays）、水稻（Oryza sativa）和小麦（Triticum aestivum）的五组基因型和表型数据，并将其模型精度与其他七种流行的 GS 工具进行了比较：岭回归最佳线性无偏预测（rrBLUP）、随机森林、支持向量回归、轻梯度提升机（LightGBM）、CNN、DeepGS 和基因组预测的深度神经网络（DNNGP）。与最佳比较模型生成的预测相比，TrG2P 分别将水稻、玉米和小麦的产量预测精度提高了 39.9%、6.8%和 1.8%。因此，我们的工作表明，迁移学习是通过整合非产量性状数据信息来提高产量预测的有效策略。我们将其增强的预测精度归因于与产量相关的性状提供的有价值信息以及训练数据集的扩充。TrG2P 的 Python 实现可在 https://github.com/lijinlong1991/TrG2P 上获得。基于网络的工具可在 http://trg2p.ebreed.cn:81 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3e0/11287160/03be4924f2fe/gr1.jpg

相似文献

TrG2P: A transfer-learning-based tool integrating multi-trait data for accurate prediction of crop yield.

Plant Commun. 2024 Jul 8;5(7):100975. doi: 10.1016/j.xplc.2024.100975. Epub 2024 May 15.

DNNGP, a deep neural network-based method for genomic prediction using multi-omics data in plants.

Mol Plant. 2023 Jan 2;16(1):279-293. doi: 10.1016/j.molp.2022.11.004. Epub 2022 Nov 10.

Optimal Designs for Genomic Selection in Hybrid Crops.

Mol Plant. 2019 Mar 4;12(3):390-401. doi: 10.1016/j.molp.2018.12.022. Epub 2019 Jan 6.

LightGBM: accelerated genomically designed crop breeding through ensemble learning.

Genome Biol. 2021 Sep 20;22(1):271. doi: 10.1186/s13059-021-02492-y.

SoyDNGP: a web-accessible deep learning framework for genomic prediction in soybean breeding.

Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad349.

PNNGS, a multi-convolutional parallel neural network for genomic selection.

Front Plant Sci. 2024 Sep 3;15:1410596. doi: 10.3389/fpls.2024.1410596. eCollection 2024.

Using machine learning to combine genetic and environmental data for maize grain yield predictions across multi-environment trials.

Theor Appl Genet. 2024 Jul 23;137(8):189. doi: 10.1007/s00122-024-04687-w.

Root hairs: an underexplored target for sustainable cereal crop production.

J Exp Bot. 2024 Sep 27;75(18):5484-5500. doi: 10.1093/jxb/erae275.

A Comparison of Three Machine Learning Methods for Multivariate Genomic Prediction Using the Sparse Kernels Method (SKM) Library.

Genes (Basel). 2022 Aug 21;13(8):1494. doi: 10.3390/genes13081494.

Multi-trait, Multi-environment Deep Learning Modeling for Genomic-Enabled Prediction of Plant Traits.

G3 (Bethesda). 2018 Dec 10;8(12):3829-3840. doi: 10.1534/g3.118.200728.

引用本文的文献

PSR-MAPMS: A new approach for the interpretable prediction of myelin autoantigenic peptides in multiple sclerosis using multi-source propensity scores.

Protein Sci. 2025 Aug;34(8):e70010. doi: 10.1002/pro.70010.

Integrating multi-omics and machine learning for disease resistance prediction in legumes.

Theor Appl Genet. 2025 Jun 27;138(7):163. doi: 10.1007/s00122-025-04948-2.

Advances in multi-trait genomic prediction approaches: classification, comparative analysis, and perspectives.

Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf211.

WheatGP, a genomic prediction method based on CNN and LSTM.

Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf191.

Fast-forwarding plant breeding with deep learning-based genomic prediction.

J Integr Plant Biol. 2025 Jul;67(7):1700-1705. doi: 10.1111/jipb.13914. Epub 2025 Apr 14.

Application of machine learning and genomics for orphan crop improvement.

Nat Commun. 2025 Jan 24;16(1):982. doi: 10.1038/s41467-025-56330-x.

Using the Pearson's correlation coefficient as the sole metric to measure the accuracy of quantitative trait prediction: is it sufficient?

Front Plant Sci. 2024 Dec 10;15:1480463. doi: 10.3389/fpls.2024.1480463. eCollection 2024.

Big data and artificial intelligence-aided crop breeding: Progress and prospects.

J Integr Plant Biol. 2025 Mar;67(3):722-739. doi: 10.1111/jipb.13791. Epub 2024 Oct 28.

本文引用的文献

Transfer learning enables predictions in network biology.

Nature. 2023 Jun;618(7965):616-624. doi: 10.1038/s41586-023-06139-9. Epub 2023 May 31.

Crop genomic selection with deep learning and environmental data: A survey.

Front Artif Intell. 2023 Jan 10;5:1040295. doi: 10.3389/frai.2022.1040295. eCollection 2022.

Modeling the influence of phenotypic plasticity on maize hybrid performance.

Plant Commun. 2023 May 8;4(3):100548. doi: 10.1016/j.xplc.2023.100548. Epub 2023 Jan 11.

DNNGP, a deep neural network-based method for genomic prediction using multi-omics data in plants.

Mol Plant. 2023 Jan 2;16(1):279-293. doi: 10.1016/j.molp.2022.11.004. Epub 2022 Nov 10.

Explainable deep transfer learning model for disease risk prediction using high-dimensional genomic data.

PLoS Comput Biol. 2022 Jul 15;18(7):e1010328. doi: 10.1371/journal.pcbi.1010328. eCollection 2022 Jul.

Exploiting deep transfer learning for the prediction of functional non-coding variants using genomic sequence.

Bioinformatics. 2022 Jun 13;38(12):3164-3172. doi: 10.1093/bioinformatics/btac214.

The compound effects of drought and high temperature stresses will be the main constraints on maize yield in Northeast China.

Sci Total Environ. 2022 Mar 15;812:152461. doi: 10.1016/j.scitotenv.2021.152461. Epub 2021 Dec 20.

Biologically relevant transfer learning improves transcription factor binding prediction.

Genome Biol. 2021 Sep 27;22(1):280. doi: 10.1186/s13059-021-02499-5.

LightGBM: accelerated genomically designed crop breeding through ensemble learning.

Genome Biol. 2021 Sep 20;22(1):271. doi: 10.1186/s13059-021-02492-y.

Investigating the impact of pre-processing techniques and pre-trained word embeddings in detecting Arabic health information on social media.

J Big Data. 2021;8(1):95. doi: 10.1186/s40537-021-00488-w. Epub 2021 Jul 2.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

TrG2P：一种基于迁移学习的工具，集成多性状数据，用于准确预测作物产量。

TrG2P: A transfer-learning-based tool integrating multi-trait data for accurate prediction of crop yield.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献