Suppr超能文献

几种用于单花蜂蜜分类的机器学习算法的比较研究

Comparative Study of Several Machine Learning Algorithms for Classification of Unifloral Honeys.

作者信息

Mateo Fernando, Tarazona Andrea, Mateo Eva María

机构信息

Department of Electronic Engineering, ETSE, University of Valencia, 46100 Burjasot, Spain.

Department of Microbiology and Ecology, University of Valencia, 46100 Burjasot, Spain.

出版信息

Foods. 2021 Jul 3;10(7):1543. doi: 10.3390/foods10071543.

Abstract

Unifloral honeys are highly demanded by honey consumers, especially in Europe. To ensure that a honey belongs to a very appreciated botanical class, the classical methodology is palynological analysis to identify and count pollen grains. Highly trained personnel are needed to perform this task, which complicates the characterization of honey botanical origins. Organoleptic assessment of honey by expert personnel helps to confirm such classification. In this study, the ability of different machine learning (ML) algorithms to correctly classify seven types of Spanish honeys of single botanical origins (rosemary, citrus, lavender, sunflower, eucalyptus, heather and forest honeydew) was investigated comparatively. The botanical origin of the samples was ascertained by pollen analysis complemented with organoleptic assessment. Physicochemical parameters such as electrical conductivity, pH, water content, carbohydrates and color of unifloral honeys were used to build the dataset. The following ML algorithms were tested: penalized discriminant analysis (PDA), shrinkage discriminant analysis (SDA), high-dimensional discriminant analysis (HDDA), nearest shrunken centroids (PAM), partial least squares (PLS), C5.0 tree, extremely randomized trees (ET), weighted k-nearest neighbors (KKNN), artificial neural networks (ANN), random forest (RF), support vector machine (SVM) with linear and radial kernels and extreme gradient boosting trees (XGBoost). The ML models were optimized by repeated 10-fold cross-validation primarily on the basis of log loss or accuracy metrics, and their performance was compared on a test set in order to select the best predicting model. Built models using PDA produced the best results in terms of overall accuracy on the test set. ANN, ET, RF and XGBoost models also provided good results, while SVM proved to be the worst.

摘要

单花蜂蜜深受蜂蜜消费者的青睐,在欧洲尤其如此。为确保蜂蜜属于备受推崇的植物类别,传统方法是进行孢粉学分析,以识别和计数花粉粒。执行这项任务需要训练有素的人员,这使得蜂蜜植物来源的鉴定变得复杂。专业人员对蜂蜜进行感官评估有助于确认这种分类。在本研究中,比较研究了不同机器学习(ML)算法对七种单一植物来源的西班牙蜂蜜(迷迭香、柑橘、薰衣草、向日葵、桉树、石南和森林蜜露)进行正确分类的能力。通过花粉分析并辅以感官评估来确定样品的植物来源。使用单花蜂蜜的电导率、pH值、水分含量、碳水化合物和颜色等理化参数构建数据集。测试了以下ML算法:惩罚判别分析(PDA)、收缩判别分析(SDA)、高维判别分析(HDDA)、最近收缩质心(PAM)、偏最小二乘法(PLS)、C5.0树、极端随机树(ET)、加权k近邻(KKNN)、人工神经网络(ANN)、随机森林(RF)、具有线性和径向核的支持向量机(SVM)以及极端梯度提升树(XGBoost)。ML模型主要基于对数损失或准确率指标通过重复10折交叉验证进行优化,并在测试集上比较它们的性能,以选择最佳预测模型。使用PDA构建的模型在测试集的整体准确率方面产生了最佳结果。ANN、ET、RF和XGBoost模型也取得了良好的结果,而SVM被证明是最差的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f6d9/8303996/9b716bdfbf94/foods-10-01543-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验