Department of Chemistry, Idaho State University, Pocatello, Idaho 83209, United States.
Anal Chem. 2020 Apr 7;92(7):5354-5361. doi: 10.1021/acs.analchem.0c00017. Epub 2020 Mar 17.
A significant and common problem in analytical chemistry is determining if a sample belongs to a specific class, e.g., establishing if a food product is genuine or counterfeit or a tissue sample is benign or malignant. This problem is termed one-class classification (class modeling). Problematic with class modeling is determining which one-class classifier to use followed by the challenge of optimizing the chosen classifier (identifying the best tuning parameter value(s)). With spectroscopic data, two other conundrums arise: which data preprocessing method(s) and spectral region(s) to use. Presented in this paper is a hybrid fusion process that can combine nonoptimized classifiers across multiple instruments, preprocessing methods, and measurements. Instead of optimizing classifiers, a window of tuning parameters is used for each classifier. The flexible fusion method of sum of ranking differences (SRD) is applied to combine all assessment values. Defining the best SRD ranking value (threshold) for determining class membership is the one tuning parameter value needed. However, this SRD ranking value is automatically optimized by using a receiver operator characteristic (ROC) curve. The approach is demonstrated on two analytical data sets. The first is a beer authentication sample set measured on five instruments: near-infrared, mid infrared (MIR), ultraviolet, visible, and thermogravimetric. Three different fusion protocols of all five instruments are demonstrated. The second data set is MIR spectra of strawberry puree with two categories: strawberry puree and nonstrawberry puree. Fusing nonoptimized classifiers provides reliable classifications relative to accuracy, sensitivity, and specificity.
分析化学中一个重要且常见的问题是确定样品属于特定类别,例如确定食品是真品还是赝品,组织样本是良性还是恶性。这个问题被称为一类分类(类别建模)。类别建模的问题在于确定要使用哪个一类分类器,然后是优化所选分类器(确定最佳调整参数值)的挑战。对于光谱数据,还会出现另外两个难题:使用哪些数据预处理方法和光谱区域。本文提出了一种混合融合过程,可以结合来自多个仪器、预处理方法和测量的非优化分类器。该方法不是优化分类器,而是为每个分类器使用一个调整参数窗口。应用灵活的融合方法总和排序差异(SRD)来组合所有评估值。定义最佳 SRD 排名值(阈值)来确定类别成员身份是所需的唯一调整参数值。但是,通过使用接收器操作特性(ROC)曲线自动优化此 SRD 排名值。该方法在两个分析数据集上进行了演示。第一个是在五台仪器上测量的啤酒认证样本集:近红外、中红外(MIR)、紫外、可见和热重分析。演示了所有五台仪器的三种不同融合协议。第二个数据集是草莓泥的 MIR 光谱,分为两类:草莓泥和非草莓泥。融合非优化分类器可以提供可靠的分类,相对于准确性、灵敏度和特异性。