Wang Cong, Fu Yufeng, Wan Ran, Zhao Le, Wang Hongbo, Guo Junwei, Liu Qiang, Li Shan, Ma Shengtao, Wang Zhicai, Huang Wei, Liu Huimin, Yang Song, Nie Cong
Key Laboratory of Tobacco Chemistry, Zhengzhou Tobacco Research Institute of China National Tobacco Corporation (CNTC), Zhengzhou, China.
Technology Center, China Tobacco Henan Industrial Co., Ltd., Zhengzhou, China.
Front Plant Sci. 2025 Aug 20;16:1597673. doi: 10.3389/fpls.2025.1597673. eCollection 2025.
Image and near-infrared (NIR) spectroscopic data are widely used for constructing analytical models in precision agriculture. While model interpretation can provide valuable insights for quality control and improvement, the inherent ambiguity of individual image pixels or spectral data points often hinders practical interpretability when using raw data directly. Furthermore, the presence of imbalanced datasets can lead to model overfitting and consequently, poor robustness. Therefore, developing alternative approaches for constructing interpretable and robust models using these data types is crucial.
This study proposes using preprocessed data-specifically, morphological features extracted from images and chemical component concentrations predicted from NIR spectra-to build multiclass identification models. Combined kernel SVM based models were proposed to identify the rice variety and cultivation region of tobacco. The determination of kernel parameters and percentage of different types of kernel functions were accomplished by PSO, which make the approach self-adaptive. Feature importance and contribution analyses were conducted using Shapley additive explanations (SHAP).
The resulting models demonstrated high robustness and accuracy, achieving classification success rates of 97.9 and 97.4% via n-fold cross validation on rice and tobacco datasets, respectively, and 97.7% on an independent test set (tobacco dataset 2). This analysis identified key variables and elucidated their specific contributions to the model predictions.
This study expands the applicability of image and NIR spectroscopic data, offering researchers an effective methodology for investigating factors crucial to the quality control and improvement of agricultural products.
图像和近红外(NIR)光谱数据广泛用于精准农业中的分析模型构建。虽然模型解释可为质量控制和改进提供有价值的见解,但直接使用原始数据时,单个图像像素或光谱数据点的固有模糊性常常阻碍实际的可解释性。此外,不平衡数据集的存在可能导致模型过拟合,进而导致鲁棒性较差。因此,开发使用这些数据类型构建可解释且鲁棒模型的替代方法至关重要。
本研究建议使用预处理数据——具体而言,从图像中提取的形态特征和从近红外光谱预测的化学成分浓度——来构建多类识别模型。提出了基于组合核支持向量机(SVM)的模型来识别烟草的水稻品种和种植区域。通过粒子群优化算法(PSO)确定核参数和不同类型核函数的百分比,使该方法具有自适应性。使用夏普利加法解释(SHAP)进行特征重要性和贡献分析。
所得模型表现出高鲁棒性和准确性,通过对水稻和烟草数据集进行n折交叉验证,在水稻和烟草数据集上的分类成功率分别达到97.9%和97.4%,在独立测试集(烟草数据集2)上达到97.7%。该分析确定了关键变量,并阐明了它们对模型预测的具体贡献。
本研究扩展了图像和近红外光谱数据的适用性,为研究人员提供了一种有效的方法来研究对农产品质量控制和改进至关重要的因素。