Department of Computer Science & Engineering, Dalian University of Technology, Dalian 116024, China.
Talanta. 2010 Sep 15;82(4):1571-5. doi: 10.1016/j.talanta.2010.07.053. Epub 2010 Jul 30.
We applied the random forest method to discriminate among different kinds of cut tobacco. To overcome the influence of the descending resolution caused by column pollution and the subsequent deterioration of column efficacy at different testing times, we constructed combined peaks by summing the peaks over a specific elution time interval Deltat. On constructing tree classifiers, both the original peaks and the combined peaks were considered. A data set of 75 samples from three grades of the same tobacco brand was used to evaluate our method. Two parameters of the random forest were optimized using out-of-bag error, and the relationship between Deltat and classification rate was investigated. Experiments show that partial least squares discriminant analysis was not suitable because of the overfitting, and the random forest with the combined features performed more accurately than Naïve Bayes, support vector machines, bootstrap aggregating and the random forest using only its original features.
我们应用随机森林方法来区分不同种类的烟丝。为了克服由于柱污染导致的分辨率下降以及在不同测试时间下柱效随后恶化的影响,我们通过对特定洗脱时间间隔 Deltat 进行求和来构建组合峰。在构建树分类器时,同时考虑了原始峰和组合峰。我们使用同一烟草品牌的三个等级的 75 个样本数据集来评估我们的方法。通过袋外误差优化了随机森林的两个参数,并研究了 Deltat 与分类率之间的关系。实验表明,由于过拟合,偏最小二乘判别分析不适用,而使用组合特征的随机森林比朴素贝叶斯、支持向量机、自举聚合和仅使用原始特征的随机森林更准确。