Ye Chonghang, Li Kai, Sun Weicheng, Jiang Yiwei, Zhang Weihan, Zhang Ping, Hu Yi-Juan, Han Yuepeng, Li Li
Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
Hubei Hongshan Laboratory, Wuhan 430070, China.
Genes (Basel). 2025 Mar 31;16(4):411. doi: 10.3390/genes16040411.
Genomic prediction is a powerful approach that predicts phenotypic traits from genotypic information, enabling the acceleration of trait improvement in plant breeding. Traditional genomic prediction methods have primarily relied on linear mixed models, such as Genomic Best Linear Unbiased Prediction (GBLUP), and conventional machine learning methods like Support Vector Regression (SVR). Traditional methods are limited in handling high-dimensional data and nonlinear relationships. Thus, deep learning methods have also been applied to genomic prediction in recent years. We proposed iADEP, Integrated Additive, Dominant, and Epistatic Prediction model based on deep learning. Specifically, single nucleotide polymorphism (SNP) data integrating latent genetic interactions and genome-wide association study results as biological prior knowledge are fused to an SNP embedding block, which is then input to a local encoder. The local encoder is fused with an omic-data-incorporated global decoder through a multi-head attention mechanism, followed by multilayer perceptrons. : Firstly, we demonstrated through experiments on four datasets that iADEP outperforms existing methods in genotype-to-phenotype prediction. Secondly, we validated the effectiveness of SNP embedding through ablation experiments. Third, we provided an available module for combining other omics data in iADEP and propose a novel method for fusing them. Fourthly, we explored the impact of feature selection on iADEP performance and conclude that utilizing the full set of SNPs generally provides optimal results. Finally, by altering the partition of training and testing sets, we investigated the differences between transductive learning and inductive learning. iADEP provides a new approach for AI breeding, a promising method that integrates biological prior knowledge and enables combination with other omics data.
基因组预测是一种强大的方法,它能根据基因型信息预测表型性状,从而加速植物育种中的性状改良。传统的基因组预测方法主要依赖线性混合模型,如基因组最佳线性无偏预测(GBLUP),以及传统机器学习方法,如支持向量回归(SVR)。传统方法在处理高维数据和非线性关系方面存在局限性。因此,近年来深度学习方法也被应用于基因组预测。我们提出了iADEP,即基于深度学习的整合加性、显性和上位性预测模型。具体而言,将整合潜在遗传相互作用和全基因组关联研究结果作为生物学先验知识的单核苷酸多态性(SNP)数据融合到一个SNP嵌入模块中,然后将其输入到一个局部编码器。局部编码器通过多头注意力机制与一个整合了组学数据的全局解码器融合,随后接多层感知器。首先,我们通过在四个数据集上的实验证明,iADEP在基因型到表型的预测方面优于现有方法。其次,我们通过消融实验验证了SNP嵌入的有效性。第三,我们在iADEP中提供了一个用于组合其他组学数据的可用模块,并提出了一种融合它们的新方法。第四,我们探讨了特征选择对iADEP性能的影响,并得出结论,使用全套SNP通常能提供最佳结果。最后,通过改变训练集和测试集的划分,我们研究了转导学习和归纳学习之间的差异。iADEP为人工智能育种提供了一种新方法,这是一种有前景的方法,它整合了生物学先验知识,并能够与其他组学数据相结合。