Department of Analytical Chemistry, Faculty of Sciences, IVAGRO, CeiA3, University of Cadiz, 11510 Puerto Real, Spain.
Department of Physical Chemistry, Faculty of Sciences, INBIO, University of Cadiz, Apartado 40, 11510 Puerto Real, Spain.
Sensors (Basel). 2022 May 19;22(10):3852. doi: 10.3390/s22103852.
Fruit juice production is one of the most important sectors in the beverage industry, and its adulteration by adding cheaper juices is very common. This study presents a methodology based on the combination of machine learning models and near-infrared spectroscopy for the detection and quantification of juice-to-juice adulteration. We evaluated 100% squeezed apple, pineapple, and orange juices, which were adulterated with grape juice at different percentages (5%, 10%, 15%, 20%, 30%, 40%, and 50%). The spectroscopic data have been combined with different machine learning tools to develop predictive models for the control of the juice quality. The use of non-supervised techniques, specifically model-based clustering, revealed a grouping trend of the samples depending on the type of juice. The use of supervised techniques such as random forest and linear discriminant analysis models has allowed for the detection of the adulterated samples with an accuracy of 98% in the test set. In addition, a Boruta algorithm was applied which selected 89 variables as significant for adulterant quantification, and support vector regression achieved a regression coefficient of 0.989 and a root mean squared error of 1.683 in the test set. These results show the suitability of the machine learning tools combined with spectroscopic data as a screening method for the quality control of fruit juices. In addition, a prototype application has been developed to share the models with other users and facilitate the detection and quantification of adulteration in juices.
果汁生产是饮料行业最重要的领域之一,其通过添加更便宜的果汁进行掺假的情况非常普遍。本研究提出了一种基于机器学习模型和近红外光谱相结合的方法,用于检测和量化果汁掺假。我们评估了 100%压榨的苹果、菠萝和橙汁,这些果汁被不同比例(5%、10%、15%、20%、30%、40%和 50%)的葡萄汁掺假。光谱数据与不同的机器学习工具相结合,为控制果汁质量开发了预测模型。非监督技术的使用,特别是基于模型的聚类,揭示了样品根据果汁类型的分组趋势。监督技术的使用,如随机森林和线性判别分析模型,允许以 98%的准确率在测试集中检测到掺假样品。此外,还应用了 Boruta 算法,该算法选择了 89 个变量作为掺杂物定量的重要变量,支持向量回归在测试集中达到了 0.989 的回归系数和 1.683 的均方根误差。这些结果表明,将机器学习工具与光谱数据相结合作为水果汁质量控制的筛选方法是合适的。此外,还开发了一个原型应用程序,以与其他用户共享模型,并便于检测和量化果汁中的掺假。