Wang Huihui, Che Xiaoxue, Nan Jiaxuan, Miao Yuyuan, Wang Yaqi, Zhang Wuping, Li Fuzhong, Han Jiwan
Software College, Shanxi Agricultural University, Taigu, Shanxi, China.
Front Plant Sci. 2025 Jul 8;16:1604088. doi: 10.3389/fpls.2025.1604088. eCollection 2025.
The optimal harvest period for buckwheat is challenging to determine due to its short growth cycle. Harvesting too early or too late can negatively affect the quality of the crop. Traditional harvest methods are labor-intensive and fail to account for the spatial variability in buckwheat quality within a field. This study explores the use of near-infrared (NIR) spectral data to classify the maturity stages of buckwheat.
Four distinct developmental stages were examined: UM (Unripe Maturity), representing buckwheat harvested at 65 days after sowing; HM (Half Maturity), harvested at 75 days; MS (Full Maturity with Shell), harvested at 85 days with husks intact; and MUS (Full Maturity Unhulled Sample), also harvested at 85 days but manually dehulled. Unlike traditional machine learning models, which require diverse and extensive datasets, this study investigates the use of a conditional WGAN-GP to generate synthetic datasets and improve model performance. Four machine learning models were employed in this study: Support Vector Machine (SVM), Random Forest (RF), k-Nearest Neighbors (KNN), and Partial Least Squares Linear Discriminant Analysis (PLS-LDA).
The conditional WGAN with the gradient penalty was trained for a range of epochs: 1000, 2000, 8000, 10,000, and 20,000. After training 10,000 epochs, synthetic hyperspectral reflectance data were very similar to real spectra for each maturity category. To assess the impact of conditional WGAN-GP data augmentation, model performance was first evaluated using the original dataset as a baseline, showing PLS-LDA had the best classification performance with accuracy of 95% and kappa coefficient of 0.93. The models were then trained on a combination of original and synthetic data, revealing that synthetic data can improve the classification model performance for RF and KNN. The best classification performance was achieved by RF with an accuracy of 97% and kappa coefficient of 0.94. This study demonstrates the effectiveness of synthetic data in enhancing classification accuracy.
由于荞麦生长周期短,其最佳收获期难以确定。收获过早或过晚都会对作物质量产生负面影响。传统的收获方法劳动强度大,且没有考虑田间荞麦质量的空间变异性。本研究探索利用近红外(NIR)光谱数据对荞麦的成熟阶段进行分类。
研究了四个不同的发育阶段:未成熟(UM),代表播种后65天收获的荞麦;半成熟(HM),75天收获;带壳完全成熟(MS),85天收获且外壳完整;以及去壳完全成熟样本(MUS),同样在85天收获但人工去壳。与需要多样且广泛数据集的传统机器学习模型不同,本研究调查了使用条件WGAN - GP生成合成数据集并提高模型性能。本研究采用了四种机器学习模型:支持向量机(SVM)、随机森林(RF)、k近邻(KNN)和偏最小二乘线性判别分析(PLS - LDA)。
对带有梯度惩罚的条件WGAN在一系列轮次下进行训练:1000、2000、8000、10000和20000。在训练10000轮次后,合成的高光谱反射率数据与每个成熟类别对应的真实光谱非常相似。为评估条件WGAN - GP数据增强的影响,首先以原始数据集为基线评估模型性能,结果显示PLS - LDA具有最佳分类性能,准确率为95%,kappa系数为0.93。然后在原始数据和合成数据的组合上训练模型,结果表明合成数据可以提高RF和KNN的分类模型性能。RF取得了最佳分类性能,准确率为97%,kappa系数为0.94。本研究证明了合成数据在提高分类准确率方面的有效性。