Huang Jing, Zhang Xuenan, Yang Hang, Li Zhenbiao, Xue Zhengfang, Wang Qingqing, Zhang Xinyuan, Ding Shenghua, Luo Zisheng, Xu Yanqun
College of Biosystems Engineering and Food Science, Key Laboratory of Agro-Products Postharvest Handling of Ministry of Agriculture and Rural Affairs, Zhejiang University, Hangzhou 310058, China.
Innovation Center for Postharvest Agro-Products Technology, Zhejiang University, Hangzhou 310058, China.
Foods. 2025 Jan 8;14(2):169. doi: 10.3390/foods14020169.
Volatile organic compounds (VOCs) are closely associated with the maturity and variety of strawberries. However, the complexity of VOCs hinders their potential application in strawberry classification. This study developed a novel classification workflow using strawberry VOC profiles and machine learning (ML) models for precise fruit classification. A comprehensive VOC dataset was rapidly collected using gas chromatography-ion mobility spectrometry (GC-IMS) from five strawberry varieties at four maturity stages (n = 300) and visualized through principal component analysis (PCA). Five ML models were developed, including partial least squares discriminant analysis (PLS-DA), decision trees, support vector machines (SVM), Xgboost and neural networks (NN). The accuracy of all models ranged from 90.00% to 98.33%, with the NN model demonstrating the best performance. Specifically, it achieved 96.67% accuracy for single-maturity classification, 98.33% for single-variety classification, and 96.67% for dual maturity and variety classification, along with 98.09% precision, 97.92% recall, and 97.91% F1 score. Feature importance analysis indicated that the NN model exhibited the most balanced reliance on various VOCs, contributing to its optimal performance with the broad-spectrum VOC detection method, GC-IMS. Overall, these findings underscore the potential of NN modeling for accurate and efficient fruit classification based on integrated VOC profiles.
挥发性有机化合物(VOCs)与草莓的成熟度和品种密切相关。然而,VOCs的复杂性阻碍了它们在草莓分类中的潜在应用。本研究开发了一种新颖的分类工作流程,利用草莓VOC谱和机器学习(ML)模型进行精确的果实分类。使用气相色谱-离子迁移谱(GC-IMS)从四个成熟阶段的五个草莓品种中快速收集了一个全面的VOC数据集(n = 300),并通过主成分分析(PCA)进行可视化。开发了五个ML模型,包括偏最小二乘判别分析(PLS-DA)、决策树、支持向量机(SVM)、Xgboost和神经网络(NN)。所有模型的准确率在90.00%至98.33%之间,其中NN模型表现最佳。具体而言,其单成熟度分类准确率达到96.67%,单品种分类准确率达到98.33%,双成熟度和品种分类准确率达到96.67%,同时精确率为98.09%,召回率为97.92%,F1分数为97.91%。特征重要性分析表明,NN模型对各种VOCs的依赖最为平衡,这有助于其在采用广谱VOC检测方法GC-IMS时表现出最佳性能。总体而言,这些发现强调了基于综合VOC谱的NN建模在准确高效果实分类方面的潜力。