Li Xin, Zhao Chun, Wang Huihui, Zhao Fangfang
Institute of Biomedical Engineering, Yanshan University, Qinhuangdao 066004, China.
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2011 Oct;28(5):901-6.
Pattern selection plays an important role in data mining and pattern recognition, especially for large scale bioinformatic data. There are many problems in this field, such as algorithm complexity and numbers of the best feature subset. In this paper, we propose a new pattern selection algorithm, carrying out pattern selection base on Mutual Information (MI). Pattern subset evaluation index was studied to ensure the best feature subset. To pattern selection, algorithm bases on the correlation of patterns and label, as well as the redundancy of each pattern. Neurofuzzy Pattern Subset Evaluation Index was researched to make sure which is the best subset for our pattern subset evaluation. To verify the effectiveness of our method, several experiments are carried out on the data of gene expression of mouse from Leiden University and UCI datasets. The experimental results indicated that our algorithm achieved better results in the complexity and accuracy.
模式选择在数据挖掘和模式识别中起着重要作用,尤其是对于大规模生物信息数据。该领域存在许多问题,如算法复杂性和最佳特征子集数量。在本文中,我们提出了一种新的模式选择算法,基于互信息(MI)进行模式选择。研究了模式子集评估指标以确保最佳特征子集。对于模式选择,算法基于模式与标签的相关性以及每个模式的冗余性。研究了神经模糊模式子集评估指标以确定哪个是我们模式子集评估的最佳子集。为验证我们方法的有效性,对来自莱顿大学的小鼠基因表达数据和UCI数据集进行了多项实验。实验结果表明,我们的算法在复杂性和准确性方面取得了更好的结果。