Yao Zhou, Yao Mengting, Wang Chuang, Li Ke, Guo Junhao, Xiao Yingjie, Yan Jianbing, Liu Jianxiao
National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan 430070, China; College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
Mol Plant. 2025 Mar 3;18(3):527-549. doi: 10.1016/j.molp.2025.01.020. Epub 2025 Jan 28.
The integration of genotypic and environmental data can enhance genomic prediction accuracy for crop field traits. Existing genomic prediction methods fail to consider environmental factors and the real growth environments of crops, resulting in low genomic prediction accuracy. In this work, we developed GEFormer, a genotype-environment interaction genomic prediction method that integrates gating multilayer perceptron (gMLP) and linear attention mechanisms. First, GEFormer uses gMLP to extract local and global features among SNPs. Then, Omni-dimensional Dynamic Convolution is used to extract the dynamic and comprehensive features of multiple environmental factors within each day, taking into consideration the real growth pattern of crops. A linear attention mechanism is used to capture the temporal features of environmental changes. Finally, GEFormer uses a gating mechanism to effectively fuse the genomic and environmental features. We examined the accuracy of GEFormer for predicting important agronomic traits of maize, rice, and wheat under three experimental scenarios: untested genotypes in tested environments, tested genotypes in untested environments, and untested genotypes in untested environments. The results showed that GEFormer outperforms six cutting-edge statistical learning methods and four machine learning methods, especially with great advantages under the scenario of untested genotypes in untested environments. In addition, we used GEFormer for three real-world breeding applications: phenotype prediction in unknown environments, hybrid phenotype prediction using an inbred population, and cross-population phenotype prediction. The results showed that GEFormer had better prediction performance in actual breeding scenarios and could be used to assist in crop breeding.
整合基因型和环境数据可以提高作物田间性状的基因组预测准确性。现有的基因组预测方法未能考虑环境因素和作物的实际生长环境,导致基因组预测准确性较低。在这项工作中,我们开发了GEFormer,这是一种整合门控多层感知器(gMLP)和线性注意力机制的基因型-环境互作基因组预测方法。首先,GEFormer使用gMLP提取单核苷酸多态性(SNP)之间的局部和全局特征。然后,考虑作物的实际生长模式,使用全维动态卷积提取每天内多个环境因素的动态和综合特征。使用线性注意力机制捕捉环境变化的时间特征。最后,GEFormer使用门控机制有效地融合基因组和环境特征。我们在三种实验场景下检验了GEFormer预测玉米、水稻和小麦重要农艺性状的准确性:测试环境中的未测试基因型、未测试环境中的测试基因型以及未测试环境中的未测试基因型。结果表明,GEFormer优于六种前沿统计学习方法和四种机器学习方法,尤其是在未测试环境中的未测试基因型场景下具有很大优势。此外,我们将GEFormer用于三个实际育种应用:未知环境中的表型预测、利用近交群体进行杂交表型预测以及跨群体表型预测。结果表明,GEFormer在实际育种场景中具有更好的预测性能,可用于辅助作物育种。