Facultad de Ingeniería de Industrias Alimentarias, Universidad Nacional de Frontera, Sullana 20100, Peru.
Departamento de Ciencias Computacionales e Ingenierías, Universidad de Guadalajara, Ameca 46600, Jalisco, Mexico.
Spectrochim Acta A Mol Biomol Spectrosc. 2022 Apr 5;270:120815. doi: 10.1016/j.saa.2021.120815. Epub 2021 Dec 29.
Near-Infrared Spectroscopy (NIRS) has shown to be helpful in the study of rice, tea, cocoa, and other foods due to its versatility and reduced sample treatment. However, the high complexity of the data produced by NIR sensors makes necessary pre-treatments such as feature selection techniques that produce compact profiles. Supervised and unsupervised techniques have been tested, creating different subsets of features for classification, which affect the performance of the classifiers based on such compact profiles. In this sense, we propose and test a new covering array feature selection (CAFS) algorithm coupled to the naïve Bayes classifier (NBC) to discriminate among Amazonian cacao nibs from six cacao clones. The CAFS wrapper approach looks for the wavebands that maximize the F-score, and then, are more relevant for classification. For this purpose, cacao pods of six varieties were collected, and their grains were extracted and processed (fermented, dried, roasted, and milled) to obtain cacao nibs. Then from each clone NIR spectral profiles in the range of 1100-2500 nm were extracted, and relevant wavebands were selected using the proposed CAFS algorithm. For comparison, two standard feature selection techniques were implemented the multi-cluster feature selection MCFS and the eigenvector centrality feature selection ECFS. Then, based on the different selected variables, three NBCs were built and compared among them through statistical metrics. The results showed that using the wavebands selected by CAFS, the NBC performed an average accuracy of 99.63%; being this superior to the 94.92% and 95.79% for ECFS and MCFS respectively. These results showed that the wavebands selected by the proposed CAFS algorithm allowed obtaining a better fit concerning other feature selection methods reported in the literature.
近红外光谱(NIRS)因其多功能性和减少样品处理而在研究大米、茶、可可和其他食品方面显示出很大的帮助。然而,NIR 传感器产生的数据高度复杂,因此需要进行特征选择等预处理技术,以产生紧凑的图谱。已经测试了有监督和无监督技术,为分类创建了不同的特征子集,这会影响基于这些紧凑图谱的分类器的性能。在这种情况下,我们提出并测试了一种新的覆盖数组特征选择(CAFS)算法,该算法与朴素贝叶斯分类器(NBC)相结合,用于区分来自六个可可克隆的亚马孙可可豆。CAFS 封装方法寻找最大 F 分数的波段,然后,这些波段与分类更相关。为此,收集了六个品种的可可豆荚,并提取和加工(发酵、干燥、烘焙和研磨)它们的颗粒以获得可可豆。然后,从每个克隆中提取范围在 1100-2500nm 的近红外光谱曲线,并使用提出的 CAFS 算法选择相关的波段。为了进行比较,实施了两种标准特征选择技术,多聚类特征选择 MCFS 和特征向量中心性特征选择 ECFS。然后,基于不同的选择变量,构建了三个 NBC 并通过统计指标对它们进行比较。结果表明,使用 CAFS 选择的波段,NBC 的平均准确率为 99.63%;这优于 ECFS 和 MCFS 的 94.92%和 95.79%。这些结果表明,与文献中报道的其他特征选择方法相比,所提出的 CAFS 算法选择的波段可以获得更好的拟合效果。