Aviles Toledo Claudia, Crawford Melba M, Tuinstra Mitchell R
Lyles School of Civil Engineering, Purdue University, West Lafayette, IN, United States.
Department of Agronomy, Purdue University, West Lafayette, IN, United States.
Front Plant Sci. 2024 Jul 25;15:1408047. doi: 10.3389/fpls.2024.1408047. eCollection 2024.
In both plant breeding and crop management, interpretability plays a crucial role in instilling trust in AI-driven approaches and enabling the provision of actionable insights. The primary objective of this research is to explore and evaluate the potential contributions of deep learning network architectures that employ stacked LSTM for end-of-season maize grain yield prediction. A secondary aim is to expand the capabilities of these networks by adapting them to better accommodate and leverage the multi-modality properties of remote sensing data. In this study, a multi-modal deep learning architecture that assimilates inputs from heterogeneous data streams, including high-resolution hyperspectral imagery, LiDAR point clouds, and environmental data, is proposed to forecast maize crop yields. The architecture includes attention mechanisms that assign varying levels of importance to different modalities and temporal features that, reflect the dynamics of plant growth and environmental interactions. The interpretability of the attention weights is investigated in multi-modal networks that seek to both improve predictions and attribute crop yield outcomes to genetic and environmental variables. This approach also contributes to increased interpretability of the model's predictions. The temporal attention weight distributions highlighted relevant factors and critical growth stages that contribute to the predictions. The results of this study affirm that the attention weights are consistent with recognized biological growth stages, thereby substantiating the network's capability to learn biologically interpretable features. Accuracies of the model's predictions of yield ranged from 0.82-0.93 in this genetics-focused study, further highlighting the potential of attention-based models. Further, this research facilitates understanding of how multi-modality remote sensing aligns with the physiological stages of maize. The proposed architecture shows promise in improving predictions and offering interpretable insights into the factors affecting maize crop yields, while demonstrating the impact of data collection by different modalities through the growing season. By identifying relevant factors and critical growth stages, the model's attention weights provide valuable information that can be used in both plant breeding and crop management. The consistency of attention weights with biological growth stages reinforces the potential of deep learning networks in agricultural applications, particularly in leveraging remote sensing data for yield prediction. To the best of our knowledge, this is the first study that investigates the use of hyperspectral and LiDAR UAV time series data for explaining/interpreting plant growth stages within deep learning networks and forecasting plot-level maize grain yield using late fusion modalities with attention mechanisms.
在植物育种和作物管理中,可解释性对于增强对人工智能驱动方法的信任以及提供可操作的见解起着至关重要的作用。本研究的主要目标是探索和评估采用堆叠长短期记忆网络(LSTM)进行季末玉米籽粒产量预测的深度学习网络架构的潜在贡献。次要目标是通过调整这些网络,使其更好地适应和利用遥感数据的多模态特性来扩展其功能。在本研究中,提出了一种多模态深度学习架构,该架构融合了来自异构数据流的输入,包括高分辨率高光谱图像、激光雷达点云数据和环境数据,以预测玉米作物产量。该架构包括注意力机制,该机制为不同模态和时间特征赋予不同程度的重要性,这些时间特征反映了植物生长和环境相互作用的动态过程。在旨在提高预测准确性并将作物产量结果归因于遗传和环境变量的多模态网络中,研究了注意力权重的可解释性。这种方法也有助于提高模型预测的可解释性。时间注意力权重分布突出了对预测有贡献的相关因素和关键生长阶段。本研究结果证实,注意力权重与公认的生物生长阶段一致,从而证实了该网络学习具有生物学可解释特征的能力。在这项以遗传学为重点的研究中,模型产量预测的准确率在0.82至0.93之间,进一步突出了基于注意力模型的潜力。此外,本研究有助于理解多模态遥感如何与玉米的生理阶段相匹配。所提出的架构在改进预测以及提供关于影响玉米作物产量因素的可解释见解方面显示出前景,同时展示了在生长季节通过不同模态进行数据收集的影响。通过识别相关因素和关键生长阶段,模型的注意力权重提供了可用于植物育种和作物管理的有价值信息。注意力权重与生物生长阶段的一致性增强了深度学习网络在农业应用中的潜力,特别是在利用遥感数据进行产量预测方面。据我们所知,这是第一项研究利用高光谱和激光雷达无人机时间序列数据在深度学习网络中解释植物生长阶段,并使用具有注意力机制的后期融合模态预测地块级玉米籽粒产量的研究。