应用于微阵列基因表达数据的增量前向特征选择

Incremental forward feature selection with application to microarray gene expression data.

作者信息

Lee Yuh-Jye, Chang Chien-Chung, Chao Chia-Huang

机构信息

Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan.

出版信息

J Biopharm Stat. 2008;18(5):827-40. doi: 10.1080/10543400802277868.

DOI:10.1080/10543400802277868

PMID:18781519

Abstract

In this study, the authors propose a new feature selection scheme, the incremental forward feature selection, which is inspired by incremental reduced support vector machines. In their method, a new feature is added into the current selected feature subset if it will bring in the most extra information. This information is measured by using the distance between the new feature vector and the column space spanned by current feature subset. The incremental forward feature selection scheme can exclude highly linear correlated features that provide redundant information and might degrade the efficiency of learning algorithms. The method is compared with the weight score approach and the 1-norm support vector machine on two well-known microarray gene expression data sets, the acute leukemia and colon cancer data sets. These two data sets have a very few observations but huge number of genes. The linear smooth support vector machine was applied to the feature subsets selected by these three schemes respectively and obtained a slightly better classification results in the 1-norm support vector machine and incremental forward feature selection. Finally, the authors claim that the rest of genes still contain some useful information. The previous selected features are iteratively removed from the data sets and the feature selection and classification steps are repeated for four rounds. The results show that there are many distinct feature subsets that can provide enough information for classification tasks in these two microarray gene expression data sets.

摘要

在本研究中，作者提出了一种新的特征选择方案——增量前向特征选择，该方案受到增量约简支持向量机的启发。在他们的方法中，如果一个新特征能带来最多的额外信息，就将其添加到当前选定的特征子集中。此信息通过新特征向量与当前特征子集所张成的列空间之间的距离来衡量。增量前向特征选择方案可以排除提供冗余信息且可能降低学习算法效率的高度线性相关特征。该方法在两个著名的微阵列基因表达数据集——急性白血病和结肠癌数据集上，与权重评分方法和1 -范数支持向量机进行了比较。这两个数据集观测值很少，但基因数量众多。将线性平滑支持向量机分别应用于这三种方案所选的特征子集，在1 -范数支持向量机和增量前向特征选择中获得了稍好的分类结果。最后，作者声称其余基因仍包含一些有用信息。从数据集中迭代移除先前选定的特征，并将特征选择和分类步骤重复四轮。结果表明，在这两个微阵列基因表达数据集中，有许多不同的特征子集可为分类任务提供足够的信息。

相似文献

Incremental forward feature selection with application to microarray gene expression data.

J Biopharm Stat. 2008;18(5):827-40. doi: 10.1080/10543400802277868.

A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue.

Artif Intell Med. 2007 Oct;41(2):161-75. doi: 10.1016/j.artmed.2007.07.008. Epub 2007 Sep 11.

Gene selection from microarray data for cancer classification--a machine learning approach.

Comput Biol Chem. 2005 Feb;29(1):37-46. doi: 10.1016/j.compbiolchem.2004.11.001.

What should be expected from feature selection in small-sample settings.

Bioinformatics. 2006 Oct 1;22(19):2430-6. doi: 10.1093/bioinformatics/btl407. Epub 2006 Jul 26.

Tumor classification ranking from microarray data.

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S21. doi: 10.1186/1471-2164-9-S2-S21.

Hybrid huberized support vector machines for microarray classification and gene selection.

Bioinformatics. 2008 Feb 1;24(3):412-9. doi: 10.1093/bioinformatics/btm579. Epub 2008 Jan 5.

New variable selection method using interval segmentation purity with application to blockwise kernel transform support vector machine classification of high-dimensional microarray data.

J Chem Inf Model. 2009 Aug;49(8):2002-9. doi: 10.1021/ci900032q.

Filter versus wrapper gene selection approaches in DNA microarray domains.

Artif Intell Med. 2004 Jun;31(2):91-103. doi: 10.1016/j.artmed.2004.01.007.

Gene extraction for cancer diagnosis by support vector machines--an improvement.

Artif Intell Med. 2005 Sep-Oct;35(1-2):185-94. doi: 10.1016/j.artmed.2005.01.006.

A novel feature selection approach for biomedical data classification.

J Biomed Inform. 2010 Feb;43(1):15-23. doi: 10.1016/j.jbi.2009.07.008. Epub 2009 Jul 30.

引用本文的文献

A bibliometric and visual analysis of publications on artificial intelligence in colorectal cancer (2002-2022).

Front Oncol. 2023 Feb 7;13:1077539. doi: 10.3389/fonc.2023.1077539. eCollection 2023.

The role of electrostatic energy in prediction of obligate protein-protein interactions.

Proteome Sci. 2013 Nov 7;11(Suppl 1):S11. doi: 10.1186/1477-5956-11-S1-S11.

Lung cancer gene expression database analysis incorporating prior knowledge with support vector machine-based classification method.

J Exp Clin Cancer Res. 2009 Jul 18;28(1):103. doi: 10.1186/1756-9966-28-103.

A new regularized least squares support vector regression for gene selection.

BMC Bioinformatics. 2009 Feb 3;10:44. doi: 10.1186/1471-2105-10-44.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

应用于微阵列基因表达数据的增量前向特征选择

Incremental forward feature selection with application to microarray gene expression data.

作者信息

Lee Yuh-Jye, Chang Chien-Chung, Chao Chia-Huang

机构信息

Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan.

出版信息

J Biopharm Stat. 2008;18(5):827-40. doi: 10.1080/10543400802277868.

DOI:10.1080/10543400802277868

PMID:18781519

Abstract

摘要

应用于微阵列基因表达数据的增量前向特征选择

Incremental forward feature selection with application to microarray gene expression data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

应用于微阵列基因表达数据的增量前向特征选择

Incremental forward feature selection with application to microarray gene expression data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献