IEEE Trans Cybern. 2013 Aug;43(4):1166-77. doi: 10.1109/TSMCB.2012.2225832.
Among the huge number of attributes or features present in real-life data sets, only a small fraction of them are effective to represent the data set accurately. Prior to analysis of the data set, selecting or extracting relevant and significant features is an important preprocessing step used for pattern recognition, data mining, and machine learning. In this regard, a novel dimensionality reduction method, based on fuzzy-rough sets, that simultaneously selects attributes and extracts features using the concept of feature significance is presented. The method is based on maximizing both the relevance and significance of the reduced feature set, whereby redundancy therein is removed. This paper also presents classical and neighborhood rough sets for computing the relevance and significance of the feature set and compares their performances with that of fuzzy-rough sets based on the predictive accuracy of nearest neighbor rule, support vector machine, and decision tree. An important finding is that the proposed dimensionality reduction method based on fuzzy-rough sets is shown to be more effective for generating a relevant and significant feature subset. The effectiveness of the proposed fuzzy-rough-set-based dimensionality reduction method, along with a comparison with existing attribute selection and feature extraction methods, is demonstrated on real-life data sets.
在现实数据集存在的大量属性或特征中,只有一小部分对准确表示数据集是有效的。在对数据集进行分析之前,选择或提取相关和重要的特征是用于模式识别、数据挖掘和机器学习的重要预处理步骤。在这方面,提出了一种基于模糊粗糙集的新的降维方法,该方法同时使用特征重要性的概念选择属性和提取特征。该方法基于最大化减少特征集的相关性和重要性,同时去除其中的冗余。本文还提出了经典和邻域粗糙集,用于计算特征集的相关性和重要性,并基于最近邻规则、支持向量机和决策树的预测准确性比较它们的性能。一个重要的发现是,基于模糊粗糙集的提出的降维方法在生成相关和重要的特征子集方面更有效。在真实数据集上对基于模糊粗糙集的降维方法的有效性以及与现有属性选择和特征提取方法的比较进行了演示。