Suppr超能文献

基于模糊信息粒的近似条件熵在基因表达数据分类中的特征选择

Feature Selection Using Approximate Conditional Entropy Based on Fuzzy Information Granule for Gene Expression Data Classification.

作者信息

Zhang Hengyi

机构信息

College of Animal Science and Technology, Northwest A&F University, Yangling, China.

出版信息

Front Genet. 2021 Mar 30;12:631505. doi: 10.3389/fgene.2021.631505. eCollection 2021.

Abstract

Classification is widely used in gene expression data analysis. Feature selection is usually performed before classification because of the large number of genes and the small sample size in gene expression data. In this article, a novel feature selection algorithm using approximate conditional entropy based on fuzzy information granule is proposed, and the correctness of the method is proved by the monotonicity of entropy. Firstly, the fuzzy relation matrix is established by Laplacian kernel. Secondly, the approximately equal relation on fuzzy sets is defined. And then, the approximate conditional entropy based on fuzzy information granule and the importance of internal attributes are defined. Approximate conditional entropy can measure the uncertainty of knowledge from two different perspectives of information and algebra theory. Finally, the greedy algorithm based on the approximate conditional entropy is designed for feature selection. Experimental results for six large-scale gene datasets show that our algorithm not only greatly reduces the dimension of the gene datasets, but also is superior to five state-of-the-art algorithms in terms of classification accuracy.

摘要

分类在基因表达数据分析中被广泛应用。由于基因表达数据中基因数量众多且样本量小,特征选择通常在分类之前进行。本文提出了一种基于模糊信息粒的近似条件熵的新型特征选择算法,并通过熵的单调性证明了该方法的正确性。首先,利用拉普拉斯核建立模糊关系矩阵。其次,定义模糊集上的近似相等关系。然后,定义基于模糊信息粒的近似条件熵和内部属性的重要性。近似条件熵可以从信息和代数理论的两个不同角度衡量知识的不确定性。最后,设计了基于近似条件熵的贪心算法进行特征选择。对六个大规模基因数据集的实验结果表明,我们的算法不仅大大降低了基因数据集的维度,而且在分类准确率方面优于五种先进算法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b6f/8042210/c2599b76dc96/fgene-12-631505-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验