Xia Jiaxu, Tian Yunong, Hao Xianwei, Peng Yuhan, Luo Guanqun, Gan Zhihua
Key Laboratory of Refrigeration and Cryogenic Technology of Zhejiang Province, Zhejiang University, Hangzhou, 310027, China.
Cryogenic Center, Hangzhou City University, Hangzhou, 310015, China.
Biotechnol Biofuels Bioprod. 2025 Aug 11;18(1):90. doi: 10.1186/s13068-025-02682-x.
Biomass is greatly influenced by geographic location, soil composition, environment, and climate, making the efficient and accurate identification of growing areas highly significant. This study proposes a classification model for tobacco growing areas based on time series features from thermogravimetric analysis (TGA). This study combines Convolutional Neural Networks (CNN) with Long Short-Term Memory (LSTM) model to process the derivative thermogravimetric (DTG) data, aiming to uncover the inherent time series properties and the continuous and dynamic relationship between temperatures for classifying tobacco growing areas. By analyzing 375 tobacco samples from ten different provinces, CNN is employed to extract local features, while LSTM captures long-term dependencies in the DTG data. The dataset used in this study has a limited sample size, a wide variety of classes, and an imbalance in the number of samples across these classes. Despite these challenges, the model achieves 86.4% accuracy on the test set, significantly surpassing the performance of the traditional Support Vector Machine model, which only achieves 68.2% accuracy. Additionally, the model reveals key temperature ranges crucial for growing area classification associated with the pyrolysis temperature ranges of volatile components, hemicellulose, cellulose, lignin, and CaCO in the tobacco. This model lays the groundwork for the future use of geographical labels to accurately represent tobacco's style and quality, enabling more precise differentiation and improved quality control.
生物量受地理位置、土壤成分、环境和气候的影响很大,因此高效准确地识别种植区域具有重要意义。本研究提出了一种基于热重分析(TGA)时间序列特征的烟草种植区域分类模型。本研究将卷积神经网络(CNN)与长短期记忆(LSTM)模型相结合来处理微商热重(DTG)数据,旨在揭示固有时间序列特性以及温度之间的连续动态关系,以对烟草种植区域进行分类。通过分析来自十个不同省份的375个烟草样本,利用CNN提取局部特征,而LSTM捕捉DTG数据中的长期依赖性。本研究中使用的数据集样本量有限、类别多样且各类别样本数量不均衡。尽管存在这些挑战,该模型在测试集上的准确率达到了86.4%,显著超过了传统支持向量机模型仅68.2%的准确率。此外,该模型揭示了与烟草中挥发性成分、半纤维素、纤维素、木质素和碳酸钙的热解温度范围相关的对种植区域分类至关重要的关键温度范围。该模型为未来使用地理标签准确表征烟草的风格和品质奠定了基础,从而实现更精确的区分和改进质量控制。