Wang X Rosalind, Lizier Joseph T, Nowotny Thomas, Berna Amalia Z, Prokopenko Mikhail, Trowell Stephen C
CSIRO Computational Informatics, Epping, NSW, Australia.
CCNR, School of Engineering and Informatics, University of Sussex, Falmer, Brighton United Kingdom; CSIRO Ecosystem Sciences and Food Futures Flagship, Canberra, ACT, Australia.
PLoS One. 2014 Mar 4;9(3):e89840. doi: 10.1371/journal.pone.0089840. eCollection 2014.
We address the problem of feature selection for classifying a diverse set of chemicals using an array of metal oxide sensors. Our aim is to evaluate a filter approach to feature selection with reference to previous work, which used a wrapper approach on the same data set, and established best features and upper bounds on classification performance. We selected feature sets that exhibit the maximal mutual information with the identity of the chemicals. The selected features closely match those found to perform well in the previous study using a wrapper approach to conduct an exhaustive search of all permitted feature combinations. By comparing the classification performance of support vector machines (using features selected by mutual information) with the performance observed in the previous study, we found that while our approach does not always give the maximum possible classification performance, it always selects features that achieve classification performance approaching the optimum obtained by exhaustive search. We performed further classification using the selected feature set with some common classifiers and found that, for the selected features, Bayesian Networks gave the best performance. Finally, we compared the observed classification performances with the performance of classifiers using randomly selected features. We found that the selected features consistently outperformed randomly selected features for all tested classifiers. The mutual information filter approach is therefore a computationally efficient method for selecting near optimal features for chemical sensor arrays.
我们探讨了使用一系列金属氧化物传感器对多种化学物质进行分类时的特征选择问题。我们的目标是参照之前的工作评估一种特征选择的过滤方法,之前的工作在同一数据集上使用了包装器方法,并确定了最佳特征和分类性能的上限。我们选择了与化学物质身份具有最大互信息的特征集。所选特征与之前使用包装器方法对所有允许的特征组合进行穷举搜索的研究中发现表现良好的特征紧密匹配。通过比较支持向量机(使用通过互信息选择的特征)的分类性能与之前研究中观察到的性能,我们发现虽然我们的方法并不总是能给出最大可能的分类性能,但它总是选择能实现接近通过穷举搜索获得的最优分类性能的特征。我们使用所选特征集与一些常见分类器进行了进一步分类,发现对于所选特征,贝叶斯网络表现最佳。最后,我们将观察到的分类性能与使用随机选择特征的分类器的性能进行了比较。我们发现对于所有测试的分类器,所选特征始终优于随机选择的特征。因此,互信息过滤方法是一种计算效率高的方法,用于为化学传感器阵列选择接近最优的特征。