USDA Forest Service, Forest Products Laboratory, Madison, WI 53726, USA.
Department of Biological Systems Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA.
Sensors (Basel). 2024 Mar 21;24(6):1992. doi: 10.3390/s24061992.
Near-infrared (NIR) spectroscopy is widely used as a nondestructive evaluation (NDE) tool for predicting wood properties. When deploying NIR models, one faces challenges in ensuring representative training data, which large datasets can mitigate but often at a significant cost. Machine learning and deep learning NIR models are at an even greater disadvantage because they typically require higher sample sizes for training. In this study, NIR spectra were collected to predict the modulus of elasticity (MOE) of southern pine lumber (training set = 573 samples, testing set = 145 samples). To account for the limited size of the training data, this study employed a generative adversarial network (GAN) to generate synthetic NIR spectra. The training dataset was fed into a GAN to generate 313, 573, and 1000 synthetic spectra. The original and enhanced datasets were used to train artificial neural networks (ANNs), convolutional neural networks (CNNs), and light gradient boosting machines (LGBMs) for MOE prediction. Overall, results showed that data augmentation using GAN improved the coefficient of determination (R) by up to 7.02% and reduced the error of predictions by up to 4.29%. ANNs and CNNs benefited more from synthetic spectra than LGBMs, which only yielded slight improvement. All models showed optimal performance when 313 synthetic spectra were added to the original training data; further additions did not improve model performance because the quality of the datapoints generated by GAN beyond a certain threshold is poor, and one of the main reasons for this can be the size of the initial training data fed into the GAN. LGBMs showed superior performances than ANNs and CNNs on both the original and enhanced training datasets, which highlights the significance of selecting an appropriate machine learning or deep learning model for NIR spectral-data analysis. The results highlighted the positive impact of GAN on the predictive performance of models utilizing NIR spectroscopy as an NDE technique and monitoring tool for wood mechanical-property evaluation. Further studies should investigate the impact of the initial size of training data, the optimal number of generated synthetic spectra, and machine learning or deep learning models that could benefit more from data augmentation using GANs.
近红外(NIR)光谱广泛用作预测木材性质的无损评估(NDE)工具。在部署 NIR 模型时,面临着确保具有代表性的训练数据的挑战,而大数据集可以减轻这种挑战,但通常代价高昂。机器学习和深度学习 NIR 模型则处于更加不利的地位,因为它们通常需要更高的训练样本量。在这项研究中,收集了近红外光谱来预测南方松木材的弹性模量(MOE)(训练集=573 个样本,测试集=145 个样本)。为了解决训练数据有限的问题,本研究采用了生成对抗网络(GAN)来生成合成的近红外光谱。将训练数据集输入 GAN 以生成 313、573 和 1000 个合成光谱。使用原始和增强数据集来训练人工神经网络(ANNs)、卷积神经网络(CNNs)和轻梯度提升机(LGBMs)以进行 MOE 预测。总体而言,结果表明,使用 GAN 进行数据扩充最多可将决定系数(R)提高 7.02%,并将预测误差降低多达 4.29%。与 LGBMs 相比,ANNs 和 CNNs 从合成光谱中受益更多,而 LGBMs 仅略有改善。当将 313 个合成光谱添加到原始训练数据中时,所有模型都表现出最佳性能;进一步增加不会提高模型性能,因为 GAN 生成的数据点的质量在超过一定阈值后很差,这主要是因为初始训练数据的大小被馈送到 GAN。在原始和增强的训练数据集上,LGBMs 的表现都优于 ANNs 和 CNNs,这突出了选择适当的机器学习或深度学习模型进行 NIR 光谱数据分析的重要性。研究结果突出了 GAN 对利用 NIR 光谱作为 NDE 技术和监测木材机械性能评估的工具的模型预测性能的积极影响。进一步的研究应调查初始训练数据大小、生成的合成光谱的最佳数量以及可以从 GAN 数据扩充中受益更多的机器学习或深度学习模型的影响。