Mekler Alexey, Schwarz Dmitri
The Bonch-Bruevich Saint-Petersburg State University of Telecommunications, 61, Moika, 191186 Saint-Petersburg, Russia.
The Bonch-Bruevich Saint-Petersburg State University of Telecommunications, 61, Moika, 191186 Saint-Petersburg, Russia.
J Biomed Inform. 2014 Oct;51:210-8. doi: 10.1016/j.jbi.2014.06.001. Epub 2014 Jun 9.
One of the important aspects of the data classification problem lies in making the most appropriate selection of features. The set of variables should be small and, at the same time, should provide reliable discrimination of the classes. The method for the discriminating power evaluation that enables a comparison between different sets of variables will be useful in the search for the set of variables.
A new approach to feature selection is presented. Two methods of evaluation of the data discriminating power of a feature set are suggested. Both of the methods implement self-organizing maps (SOMs) and the newly introduced exponents of the degree of data clusterization on the SOM. The first method is based on the comparison of intraclass and interclass distances on the map. Another method concerns the evaluation of the relative number of best matching unit's (BMUs) nearest neighbors of the same class. Both methods make it possible to evaluate the discriminating power of a feature set in cases when this set provides nonlinear discrimination of the classes.
Current algorithms in program code can be downloaded for free at http://mekler.narod.ru/Science/Articles_support.html, as well as the supporting data files.
数据分类问题的一个重要方面在于对特征进行最恰当的选择。变量集应较小,同时应能可靠地区分不同类别。能够对不同变量集进行比较的判别力评估方法,在寻找变量集时会很有用。
提出了一种新的特征选择方法。建议了两种评估特征集数据判别力的方法。这两种方法都采用了自组织映射(SOM)以及新引入的SOM上数据聚类程度指数。第一种方法基于地图上类内距离和类间距离的比较。另一种方法涉及评估同一类中最佳匹配单元(BMU)最近邻的相对数量。当该特征集提供类的非线性判别时,这两种方法都能够评估其判别力。
程序代码中的当前算法以及支持数据文件可从http://mekler.narod.ru/Science/Articles_support.html免费下载。