Kou Ranran, Wang Cong, Liu Jinxia, Wan Ran, Jin Zhe, Zhao Le, Liu Youjie, Guo Junwei, Li Feng, Wang Hongbo, Yang Song, Nie Cong
Key Laboratory of Tobacco Chemistry, Zhengzhou Tobacco Research Institute of China National Tobacco Corporation (CNTC), Zhengzhou, China.
Technology Center, China Tobacco Jilin Industrial Co., Ltd., Changchun, China.
Front Plant Sci. 2025 Jul 25;16:1619380. doi: 10.3389/fpls.2025.1619380. eCollection 2025.
Tobacco leaf position is closely associated with its quality whose material basis is the chemical components of tobacco leaf. In recent years, near-infrared (NIR) spectroscopy combined with algorithmic models has emerged as a popular method for identifying the tobacco leaf position. However, when applied to leaf position discrimination, these models often rely on principal components derived from dimensionality-reduced spectral signals, resulting in limited interpretability and difficulty in identifying key chemical components. Chemical composition data combined with algorithmic models can also be used to discriminate tobacco leaf positions. However, the acquisition of chemical components relies on traditional instrumental analytical methods. As a result, the acquisition of chemical composition data is time-consuming and labor-intensive, involving only a limited number of compounds. The study proposes a novel approach that integrates machine learning with advanced interpretability techniques for both tobacco leaf position discrimination and analysis. Based on the 70 tobacco leaf chemical components obtained using near-infrared rapid analysis technology, tobacco leaf position discrimination models were built using Support Vector Machine (SVM), Back Propagation Neural Network (BPNN), and Random Forest (RF). Particle swarm optimization (PSO) was used to optimize parameters of each model. Chemical components were analyzed for statistical significance across leaf positions, and their influence on model predictions was interpreted using SHapley Additive exPlanations (SHAP). The experimental results showed that among all models, the SVM- hybrid kernel demonstrated the most robust and accurate performance, achieving discrimination accuracies of 98.17% and 96.33% on the training and test sets, respectively. SHAP analysis provided a clear ranking of feature importance and revealed the positive and negative contributions of individual chemical components. The proposed method can be useful for position traceability and chemical feature analysis of various crops.
烟叶部位与其品质密切相关,其物质基础是烟叶的化学成分。近年来,近红外(NIR)光谱结合算法模型已成为识别烟叶部位的常用方法。然而,在应用于叶片部位判别时,这些模型通常依赖于从降维光谱信号中导出的主成分,导致解释性有限,难以识别关键化学成分。化学成分数据结合算法模型也可用于判别烟叶部位。然而,化学成分的获取依赖于传统的仪器分析方法。因此,化学成分数据的获取既耗时又费力,且涉及的化合物数量有限。本研究提出了一种将机器学习与先进的可解释性技术相结合的新方法,用于烟叶部位判别和分析。基于近红外快速分析技术获得的70种烟叶化学成分,使用支持向量机(SVM)、反向传播神经网络(BPNN)和随机森林(RF)建立了烟叶部位判别模型。采用粒子群优化(PSO)对各模型的参数进行优化。分析了不同叶位化学成分的统计显著性,并使用SHapley Additive exPlanations(SHAP)解释了它们对模型预测的影响。实验结果表明,在所有模型中,SVM混合核表现出最强健和准确的性能,在训练集和测试集上的判别准确率分别达到98.17%和96.33%。SHAP分析提供了特征重要性的清晰排序,并揭示了单个化学成分的正负贡献。该方法可用于各种作物的部位溯源和化学特征分析。