Central Queensland University, Rockhampton 4702, Queensland, Australia.
Central Queensland University, Rockhampton 4702, Queensland, Australia.
Spectrochim Acta A Mol Biomol Spectrosc. 2024 Apr 15;311:124003. doi: 10.1016/j.saa.2024.124003. Epub 2024 Feb 9.
This study empirically validates prior claims regarding the superior performance of a Convolutional Neural Network (CNN) model for estimating mango Dry Matter Content (DMC) using Near Infrared (NIR) spectroscopy. The Partial Least Squares (PLS), Artificial Neural Network (ANN), and CNN models employed in the previous publications were compared on an equal footing, i.e., employing the same training and test data, with consideration of the effect of other practices employed in those studies, i.e., outlier removal, training set partitioning, sample ordering, and spectral pretreatment and augmentation. A new benchmark RMSEP of 0.77 %FW was achieved, being statistically significant (P<0.05) different than the previously published best RMSEP for the same independent test set. This CNN model was also shown to be more robust when tested on a new season of fruit than optimised ANN and PLS models, with RMSEPs of 1.18, 2.62, and 1.87, and bias of 0.16, 2.36 and 1.56 %FW, respectively. The combination of model type and data augmentation was important, with the CNN model only slightly outperforming the ANN model when using only a second derivative pretreatment. This requirement highlights the need for chemometric input to model development. The quantification of the sensitivity of neural network model training to use of differing seeds for pseudo-random sequence generation is also recommended. The standard deviation in RMSEP of 50 ANN and CNN models trained with differing random seeds was 0.03 and 0.02 %FW, respectively.
本研究通过近红外(NIR)光谱法对芒果干物质含量(DMC)进行估算,从实证角度验证了卷积神经网络(CNN)模型表现优于偏最小二乘法(PLS)、人工神经网络(ANN)模型的先前论断。本文将先前文献中使用的 PLS、ANN 和 CNN 模型置于同等条件下进行比较,即使用相同的训练和测试数据,并考虑了这些研究中采用的其他实践的影响,如异常值剔除、训练集划分、样本排序、光谱预处理和扩充。本研究获得了新的基准均方根误差预测值(RMSEP)为 0.77%FW,与先前发表的相同独立测试集的最佳 RMSEP 值相比具有统计学意义(P<0.05)。相较于经过优化的 ANN 和 PLS 模型,该 CNN 模型在新一季果实上的测试结果也更为稳健,其 RMSEP 值分别为 1.18%、2.62%和 1.87%,偏差分别为 0.16%、2.36%和 1.56%FW。模型类型和数据扩充的结合很重要,仅使用二阶导数预处理时,CNN 模型仅略优于 ANN 模型。这一要求凸显了化学计量学输入对模型开发的必要性。建议对神经网络模型训练对使用不同伪随机序列生成种子的敏感性进行量化。采用不同随机种子训练的 50 个 ANN 和 CNN 模型的 RMSEP 标准偏差分别为 0.03%和 0.02%FW。