Powadi Anirudha, Jubery Talukder Zaki, Tross Michael C, Schnable James C, Ganapathysubramanian Baskar
Department of Electrical and Computer Engineering, Iowa State University, Ames, IA, United States.
Translational AI Research and Education Center, Iowa State University, Ames, IA, United States.
Front Plant Sci. 2024 Dec 16;15:1476070. doi: 10.3389/fpls.2024.1476070. eCollection 2024.
In plant breeding and genetics, predictive models traditionally rely on compact representations of high-dimensional data, often using methods like Principal Component Analysis (PCA) and, more recently, Autoencoders (AE). However, these methods do not separate genotype-specific and environment-specific features, limiting their ability to accurately predict traits influenced by both genetic and environmental factors. We hypothesize that disentangling these representations into genotype-specific and environment-specific components can enhance predictive models. To test this, we developed a compositional autoencoder (CAE) that decomposes high-dimensional data into distinct genotype-specific and environment-specific latent features. Our CAE framework employed a hierarchical architecture within an autoencoder to effectively separate these entangled latent features. Applied to a maize diversity panel dataset, the CAE demonstrated superior modeling of environmental influences and out-performs PCA (principal component analysis), PLSR (Partial Least square regression) and vanilla autoencoders by 7 times for 'Days to Pollen' trait and 10 times improved predictive performance for 'Yield'. By disentangling latent features, the CAE provided a powerful tool for precision breeding and genetic research. This work has significantly enhanced trait prediction models, advancing agricultural and biological sciences.
在植物育种和遗传学中,传统的预测模型依赖于高维数据的紧凑表示,通常使用主成分分析(PCA)等方法,以及最近的自动编码器(AE)。然而,这些方法没有分离出基因型特异性和环境特异性特征,限制了它们准确预测受遗传和环境因素影响的性状的能力。我们假设,将这些表示分解为基因型特异性和环境特异性成分可以增强预测模型。为了验证这一点,我们开发了一种组合自动编码器(CAE),它将高维数据分解为不同的基因型特异性和环境特异性潜在特征。我们的CAE框架在自动编码器中采用了分层架构,以有效地分离这些纠缠的潜在特征。应用于玉米多样性面板数据集时,CAE在环境影响建模方面表现出色,在“花粉形成天数”性状上比主成分分析(PCA)、偏最小二乘回归(PLSR)和普通自动编码器的性能高出7倍,在“产量”性状上预测性能提高了10倍。通过解开潜在特征,CAE为精准育种和遗传研究提供了一个强大的工具。这项工作显著增强了性状预测模型,推动了农业和生物科学的发展。