Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
Sci Rep. 2018 Mar 27;8(1):5226. doi: 10.1038/s41598-018-23394-3.
The analysis and identification of different attributes of produce such as taxonomy, vendor, and organic nature is vital to verifying product authenticity in a distribution network. Though a variety of analysis techniques have been studied in the past, we present a novel data-centric approach to classifying produce attributes. We employed visible and near infrared (NIR) spectroscopy on over 75,000 samples across several fruit and vegetable varieties. This yielded 0.90-0.98 and 0.98-0.99 classification accuracies for taxonomy and farmer classes, respectively. The most significant factors in the visible spectrum were variations in the produce color due to chlorophyll and anthocyanins. In the infrared spectrum, we observed that the varying water and sugar content levels were critical to obtaining high classification accuracies. High quality spectral data along with an optimal tuning of hyperparameters in the support vector machine (SVM) was also key to achieving high classification accuracies. In addition to demonstrating exceptional accuracies on test data, we explored insights behind the classifications, and identified the highest performing approaches using cross validation. We presented data collection guidelines, experimental design parameters, and machine learning optimization parameters for the replication of studies involving large sample sizes.
对农产品(如分类学、供应商和有机属性)的不同属性进行分析和鉴定,对于在分销网络中验证产品真实性至关重要。尽管过去已经研究了多种分析技术,但我们提出了一种新颖的数据中心方法来对农产品属性进行分类。我们对超过 75000 个样本进行了可见近红外(NIR)光谱分析,涵盖了多种水果和蔬菜品种。这分别为分类学和农民类别产生了 0.90-0.98 和 0.98-0.99 的分类精度。在可见光谱中,最重要的因素是由于叶绿素和花青素导致的农产品颜色变化。在红外光谱中,我们观察到,获得高分类精度的关键是不断变化的水和糖含量水平。高质量的光谱数据以及支持向量机(SVM)中超参数的最佳调整,也是实现高精度分类的关键。除了在测试数据上展示出色的精度外,我们还探讨了分类背后的见解,并使用交叉验证确定了表现最佳的方法。我们为涉及大样本量的研究提供了数据收集指南、实验设计参数和机器学习优化参数。