IEEE J Biomed Health Inform. 2024 Sep;28(9):5638-5648. doi: 10.1109/JBHI.2024.3409628. Epub 2024 Sep 5.
Feature selection is a critical component of data mining and has garnered significant attention in recent years. However, feature selection methods based on information entropy often introduce complex mutual information forms to measure features, leading to increased redundancy and potential errors. To address this issue, we propose FSCME, a feature selection method combining Copula correlation (Ccor) and the maximum information coefficient (MIC) by entropy weights. The FSCME takes into consideration the relevance between features and labels, as well as the redundancy among candidate features and selected features. Therefore, the FSCME utilizes Ccor to measure the redundancy between features, while also estimating the relevance between features and labels. Meanwhile, the FSCME employs MIC to enhance the credibility of the correlation between features and labels. Moreover, this study employs the Entropy Weight Method (EWM) to evaluate and assign weights to the Ccor and MIC. The experimental results demonstrate that FSCME yields a more effective feature subset for subsequent clustering processes, significantly improving the classification performance compared to the other six feature selection methods.
特征选择是数据挖掘的一个关键组成部分,近年来受到了广泛关注。然而,基于信息熵的特征选择方法通常引入复杂的互信息形式来度量特征,导致冗余增加和潜在的错误。针对这个问题,我们提出了 FSCME,这是一种通过熵权重结合 Copula 相关系数 (Ccor) 和最大信息系数 (MIC) 的特征选择方法。FSCME 考虑了特征与标签之间的相关性,以及候选特征和选择特征之间的冗余性。因此,FSCME 利用 Ccor 来度量特征之间的冗余性,同时估计特征与标签之间的相关性。同时,FSCME 采用 MIC 来增强特征与标签之间相关性的可信度。此外,本研究采用熵权法 (EWM) 来评估和分配 Ccor 和 MIC 的权重。实验结果表明,FSCME 为后续的聚类过程产生了更有效的特征子集,与其他六种特征选择方法相比,显著提高了分类性能。