Mandal Monalisa, Mukhopadhyay Anirban
Department of Computer Science and Engineering, University of Kalyani, Kalyani, West Bengal, India.
PLoS One. 2014 Mar 13;9(3):e90949. doi: 10.1371/journal.pone.0090949. eCollection 2014.
The purpose of feature selection is to identify the relevant and non-redundant features from a dataset. In this article, the feature selection problem is organized as a graph-theoretic problem where a feature-dissimilarity graph is shaped from the data matrix. The nodes represent features and the edges represent their dissimilarity. Both nodes and edges are given weight according to the feature's relevance and dissimilarity among the features, respectively. The problem of finding relevant and non-redundant features is then mapped into densest subgraph finding problem. We have proposed a multiobjective particle swarm optimization (PSO)-based algorithm that optimizes average node-weight and average edge-weight of the candidate subgraph simultaneously. The proposed algorithm is applied for identifying relevant and non-redundant disease-related genes from microarray gene expression data. The performance of the proposed method is compared with that of several other existing feature selection techniques on different real-life microarray gene expression datasets.
特征选择的目的是从数据集中识别出相关且非冗余的特征。在本文中,特征选择问题被组织为一个图论问题,其中从数据矩阵构建一个特征差异图。节点代表特征,边代表它们之间的差异。节点和边分别根据特征的相关性和特征之间的差异赋予权重。然后,寻找相关且非冗余特征的问题被映射为最密集子图查找问题。我们提出了一种基于多目标粒子群优化(PSO)的算法,该算法同时优化候选子图的平均节点权重和平均边权重。所提出的算法用于从微阵列基因表达数据中识别相关且非冗余的疾病相关基因。在不同的实际微阵列基因表达数据集上,将所提出方法的性能与其他几种现有特征选择技术的性能进行了比较。