Ermiş Sıtkı, Ercan Uğur, Kabaş Aylin, Kabaş Önder, Moiceanu Georgiana
Department of Horticulture, Faculty of Agriculture, Eskişehir Osmangazi University, Eskişehir 26040, Türkiye.
Department of Informatics, Akdeniz University, Antalya 07070, Türkiye.
Foods. 2025 Apr 25;14(9):1498. doi: 10.3390/foods14091498.
Ornamental pumpkin ( L. var. ) seeds are highly morphologically variable, and their classification is hence a complex task for the seed industry. Efficient and accurate classification is critical for agricultural production, breeding programs, and seed sorting for commerce. This study employs machine learning models-Random Forest (RF), LightGBM, and k-Nearest Neighbors (KNN)-to classify ornamental pumpkin seeds based on their morphological (mass, elongation, width, thickness) and colorimetric characteristics (L*, a*, b* values from CIELAB color space). Prior to model training, the data set was preprocessed through normalization and balancing to enhance classification performance. In this study, six different types of ornamental pumpkin seeds were used, with a total of 900 (150 each of SDE0619, SDE1020, SDE1620, SDE2621, SDE4521, and SDE7721). The classification performance of the models was evaluated using different metrics, such as Accuracy, Balanced Accuracy, Precision, Recall, F1 Score, Matthews Correlation Coefficient (MCC), and Cohen's Kappa. Among the tested models, the RF model performed best, with Accuracy of 0.959, Balanced Accuracy of 0.961, Precision (Macro) of 0.962, Recall (Macro) of 0.961, F1 Score (Macro) of 0.961, MCC of 0.951, and Cohen's Kappa of 0.951. In contrast, the worst classification performance of the tested models was with the KNN model across all the evaluation metrics. These outcomes reflect the potential of machine learning-based approaches for seed classification automation, error minimization in seed classification, and maximization of efficiency in the seed industry. The high classification performance of the Random Forest model with 95.9% accuracy and 0.951 MCC value shows that artificial intelligence-based automatic classification of ornamental pumpkin seeds according to their morphological and colorimetric characteristics can make significant contributions to the seed industry, while the integration of this approach into seed sorting and quality determination processes can enable the creation of effective breeding schemes for optimum seed selection by maximizing the accuracy of agricultural processes.
观赏南瓜(L. var.)种子在形态上具有高度变异性,因此对种子行业来说,其分类是一项复杂的任务。高效且准确的分类对于农业生产、育种计划以及商业种子分选至关重要。本研究采用机器学习模型——随机森林(RF)、LightGBM和k近邻(KNN)——基于观赏南瓜种子的形态特征(质量、长度、宽度、厚度)和比色特征(CIELAB颜色空间中的L*、a*、b*值)对其进行分类。在模型训练之前,通过归一化和平衡对数据集进行预处理,以提高分类性能。在本研究中,使用了六种不同类型的观赏南瓜种子,共900颗(SDE0619、SDE1020、SDE1620、SDE2621、SDE4521和SDE7721各150颗)。使用不同的指标评估模型的分类性能,如准确率、平衡准确率、精确率、召回率、F1分数、马修斯相关系数(MCC)和科恩卡方系数。在测试的模型中,RF模型表现最佳,准确率为0.959,平衡准确率为0.961,精确率(宏)为0.962,召回率(宏)为0.961,F1分数(宏)为0.961,MCC为0.951,科恩卡方系数为0.951。相比之下,在所有评估指标中,测试模型中KNN模型的分类性能最差。这些结果反映了基于机器学习的方法在种子分类自动化、种子分类误差最小化以及种子行业效率最大化方面的潜力。随机森林模型95.9%的准确率和0.951的MCC值表明,基于人工智能根据观赏南瓜种子的形态和比色特征进行自动分类可为种子行业做出重大贡献,而将这种方法整合到种子分选和质量测定过程中,可以通过最大化农业过程的准确性来制定有效的育种方案,以实现最佳种子选择。