College of Information Science and Engineering, Hunan University, Changsha, Hunan, 410082, China.
School of Computer and Information Science, Hunan Institute of Technology, Hengyang, 412002, China.
Sci Rep. 2018 Jun 5;8(1):8619. doi: 10.1038/s41598-018-26806-6.
In the present study, we introduce a novel semi-supervised method called the semi-supervised maximum discriminative local margin (semiMM) for gene selection in expression data. The semiMM is a "filter" approach that exploits local structure, variance, and mutual information. We first constructed a local nearest neighbour graph and divided this information into within-class and between-class local nearest neighbour graphs by weighing the edge between the two data points. The semiMM aims to discover the most discriminative features for classification via maximizing the local margin between the within-class and between-class data, the variance of all data, and the mutual information of features with class labels. Experiments on five publicly available gene expression datasets revealed the effectiveness of the proposed method compared to three state-of-the-art feature selection algorithms.
在本研究中,我们引入了一种新的半监督方法,称为半监督最大判别局部边界(semiMM),用于表达数据中的基因选择。semiMM 是一种“过滤”方法,利用局部结构、方差和互信息。我们首先构建了一个局部最近邻图,并通过给两个数据点之间的边加权来将此信息分为类内和类间局部最近邻图。semiMM 的目的是通过最大化类内和类间数据之间的局部边界、所有数据的方差以及特征与类别标签的互信息,来发现最具判别力的分类特征。在五个公开可用的基因表达数据集上的实验表明,与三种最先进的特征选择算法相比,该方法是有效的。