Department of Computer Science, Periyar University, Salem 636 011, Tamil Nadu, India.
Comput Methods Programs Biomed. 2014;113(1):175-85. doi: 10.1016/j.cmpb.2013.10.007. Epub 2013 Oct 16.
Medical datasets are often classified by a large number of disease measurements and a relatively small number of patient records. All these measurements (features) are not important or irrelevant/noisy. These features may be especially harmful in the case of relatively small training sets, where this irrelevancy and redundancy is harder to evaluate. On the other hand, this extreme number of features carries the problem of memory usage in order to represent the dataset. Feature Selection (FS) is a solution that involves finding a subset of prominent features to improve predictive accuracy and to remove the redundant features. Thus, the learning model receives a concise structure without forfeiting the predictive accuracy built by using only the selected prominent features. Therefore, nowadays, FS is an essential part of knowledge discovery. In this study, new supervised feature selection methods based on hybridization of Particle Swarm Optimization (PSO), PSO based Relative Reduct (PSO-RR) and PSO based Quick Reduct (PSO-QR) are presented for the diseases diagnosis. The experimental result on several standard medical datasets proves the efficiency of the proposed technique as well as enhancements over the existing feature selection techniques.
医学数据集通常按大量疾病测量值和相对较少的患者记录进行分类。所有这些测量值(特征)都不重要或不相关/有噪声。在相对较小的训练集中,这些不相关性和冗余性更难评估,这些特征可能特别有害。另一方面,为了表示数据集,这种极端数量的特征会带来内存使用的问题。特征选择(FS)是一种解决方案,它涉及找到一组突出的特征,以提高预测准确性并删除冗余特征。因此,学习模型接收一个简洁的结构,而不会放弃仅使用所选突出特征构建的预测准确性。因此,如今,FS 是知识发现的重要组成部分。在这项研究中,提出了基于粒子群优化(PSO)、基于 PSO 的相对约简(PSO-RR)和基于 PSO 的快速约简(PSO-QR)杂交的新型监督特征选择方法,用于疾病诊断。对几个标准医学数据集的实验结果证明了所提出技术的效率以及对现有特征选择技术的改进。