Gu Jian-lei, Lu Yao, Liu Cong, Lu Hui
Shanghai Institute of Medical Genetics, Shanghai Children׳s Hospital, Shanghai Jiao Tong University, Shanghai 200040, China; Key Laboratory of Molecular Embryology, Ministry of Health & Shanghai Laboratory of Embryo and Reproduction Engineering, Shanghai 200040, China.
Department of Bioengineering, Bioinformatics Program, University of Illinois at Chicago, Chicago, IL 60607, USA.
J Theor Biol. 2014 Dec 7;362:3-8. doi: 10.1016/j.jtbi.2014.06.038. Epub 2014 Jul 8.
Feature selection is an important research topic in bioinformatics, to date a large number of methods have been developed. Recently several pathway based feature selection protocols, such as the condition-responsive genes method, have been proposed for better classification performance. However, these conventional pathway based methods may lead to the selection of relevant but redundant genes in a given pathway while missing the other useful genes. Also these methods were limited to binary classification, while in many clinical problems a multiclass protocol is preferred such as the classification of sarcomas. Here, we propose a new pathway based feature selection method named Redundancy Removable Pathway based feature selection method (RRP) for the binary and multiclass classification problems. Three classifiers were implemented to compare the performance and gene functions of gene-based, conventional pathway based, and our RRP method. The validation results suggest that the RRP method is a feasible and robust feature selection method for multi-class prediction problems.
特征选择是生物信息学中的一个重要研究课题,迄今为止已经开发了大量的方法。最近,为了获得更好的分类性能,人们提出了几种基于通路的特征选择方案,如条件响应基因法。然而,这些传统的基于通路的方法可能会导致在给定通路中选择相关但冗余的基因,同时遗漏其他有用的基因。此外,这些方法仅限于二分类,而在许多临床问题中,多分类方案更受青睐,比如肉瘤的分类。在此,我们针对二分类和多分类问题,提出了一种新的基于通路的特征选择方法,即基于冗余可除通路的特征选择方法(RRP)。我们实现了三个分类器,以比较基于基因的、传统基于通路的和我们的RRP方法的性能及基因功能。验证结果表明,RRP方法是一种适用于多类预测问题的可行且稳健的特征选择方法。