Suppr超能文献

在微阵列数据分析中从主成分分析(PCA)和偏最小二乘法(PLS)中选择新提取特征的子集。

Selecting subsets of newly extracted features from PCA and PLS in microarray data analysis.

作者信息

Li Guo-Zheng, Bu Hua-Long, Yang Mary Qu, Zeng Xue-Qiang, Yang Jack Y

机构信息

Department of Control Science & Engineering, Tongji University, Shanghai 201804, PR China.

出版信息

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S24. doi: 10.1186/1471-2164-9-S2-S24.

Abstract

BACKGROUND

Dimension reduction is a critical issue in the analysis of microarray data, because the high dimensionality of gene expression microarray data set hurts generalization performance of classifiers. It consists of two types of methods, i.e. feature selection and feature extraction. Principle component analysis (PCA) and partial least squares (PLS) are two frequently used feature extraction methods, and in the previous works, the top several components of PCA or PLS are selected for modeling according to the descending order of eigenvalues. While in this paper, we prove that not all the top features are useful, but features should be selected from all the components by feature selection methods.

RESULTS

We demonstrate a framework for selecting feature subsets from all the newly extracted components, leading to reduced classification error rates on the gene expression microarray data. Here we have considered both an unsupervised method PCA and a supervised method PLS for extracting new components, genetic algorithms for feature selection, and support vector machines and k nearest neighbor for classification. Experimental results illustrate that our proposed framework is effective to select feature subsets and to reduce classification error rates.

CONCLUSION

Not only the top features newly extracted by PCA or PLS are important, therefore, feature selection should be performed to select subsets from new features to improve generalization performance of classifiers.

摘要

背景

在微阵列数据分析中,降维是一个关键问题,因为基因表达微阵列数据集的高维度会损害分类器的泛化性能。它由两种类型的方法组成,即特征选择和特征提取。主成分分析(PCA)和偏最小二乘法(PLS)是两种常用的特征提取方法,在以往的工作中,根据特征值的降序选择PCA或PLS的前几个成分进行建模。而在本文中,我们证明并非所有的顶级特征都是有用的,而应该通过特征选择方法从所有成分中选择特征。

结果

我们展示了一个从所有新提取的成分中选择特征子集的框架,从而降低了基因表达微阵列数据的分类错误率。这里我们考虑了用于提取新成分的无监督方法PCA和有监督方法PLS、用于特征选择的遗传算法以及用于分类的支持向量机和k近邻算法。实验结果表明,我们提出的框架对于选择特征子集和降低分类错误率是有效的。

结论

因此,不仅PCA或PLS新提取的顶级特征很重要,还应该进行特征选择以从新特征中选择子集,从而提高分类器的泛化性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7726/2559889/266096fec497/1471-2164-9-S2-S24-1.jpg

相似文献

1
Selecting subsets of newly extracted features from PCA and PLS in microarray data analysis.
BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S24. doi: 10.1186/1471-2164-9-S2-S24.
2
Chaotic genetic algorithm for gene selection and classification problems.
OMICS. 2009 Oct;13(5):407-20. doi: 10.1089/omi.2009.0007.
3
Regularized Least Squares Cancer classifiers from DNA microarray data.
BMC Bioinformatics. 2005 Dec 1;6 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-6-S4-S2.
4
Partial least squares dimension reduction for microarray gene expression data with a censored response.
Math Biosci. 2005 Jan;193(1):119-37. doi: 10.1016/j.mbs.2004.10.007. Epub 2005 Jan 22.
5
Improving PLS-RFE based gene selection for microarray data classification.
Comput Biol Med. 2015 Jul;62:14-24. doi: 10.1016/j.compbiomed.2015.04.011. Epub 2015 Apr 17.
6
Dimension reduction for classification with gene expression microarray data.
Stat Appl Genet Mol Biol. 2006;5:Article6. doi: 10.2202/1544-6115.1147. Epub 2006 Feb 24.
7
Nonnegative principal component analysis for cancer molecular pattern discovery.
IEEE/ACM Trans Comput Biol Bioinform. 2010 Jul-Sep;7(3):537-49. doi: 10.1109/TCBB.2009.36.
8
Applications of support vector machines to cancer classification with microarray data.
Int J Neural Syst. 2005 Dec;15(6):475-84. doi: 10.1142/S0129065705000396.
9
A novel feature selection approach for biomedical data classification.
J Biomed Inform. 2010 Feb;43(1):15-23. doi: 10.1016/j.jbi.2009.07.008. Epub 2009 Jul 30.
10
Feature selection and nearest centroid classification for protein mass spectrometry.
BMC Bioinformatics. 2005 Mar 23;6:68. doi: 10.1186/1471-2105-6-68.

引用本文的文献

1
Identification of Parkinson's disease using MRI and genetic data from the PPMI cohort: an improved machine learning fusion approach.
Front Aging Neurosci. 2025 Feb 4;17:1510192. doi: 10.3389/fnagi.2025.1510192. eCollection 2025.
4
Effective Feature Selection for Classification of Promoter Sequences.
PLoS One. 2016 Dec 15;11(12):e0167165. doi: 10.1371/journal.pone.0167165. eCollection 2016.
5
Pulse Diagnosis Signals Analysis of Fatty Liver Disease and Cirrhosis Patients by Using Machine Learning.
ScientificWorldJournal. 2015;2015:859192. doi: 10.1155/2015/859192. Epub 2015 Nov 28.
6
An improved independent component analysis model for 3D chromatogram separation and its solution by multi-areas genetic algorithm.
BMC Bioinformatics. 2014;15 Suppl 12(Suppl 12):S8. doi: 10.1186/1471-2105-15-S12-S8. Epub 2014 Nov 6.
7
Applications of Bayesian gene selection and classification with mixtures of generalized singular g-priors.
Comput Math Methods Med. 2013;2013:420412. doi: 10.1155/2013/420412. Epub 2013 Dec 8.
10
A glance at DNA microarray technology and applications.
Bioimpacts. 2011;1(2):75-86. doi: 10.5681/bi.2011.011. Epub 2011 Aug 4.

本文引用的文献

1
Dimension reduction with redundant gene elimination for tumor classification.
BMC Bioinformatics. 2008 May 28;9 Suppl 6(Suppl 6):S8. doi: 10.1186/1471-2105-9-S6-S8.
2
Asymmetric bagging and feature selection for activities prediction of drug molecules.
BMC Bioinformatics. 2008 May 28;9 Suppl 6(Suppl 6):S7. doi: 10.1186/1471-2105-9-S6-S7.
3
Partial least squares: a versatile tool for the analysis of high-dimensional genomic data.
Brief Bioinform. 2007 Jan;8(1):32-44. doi: 10.1093/bib/bbl016. Epub 2006 May 26.
4
Dimension reduction for classification with gene expression microarray data.
Stat Appl Genet Mol Biol. 2006;5:Article6. doi: 10.2202/1544-6115.1147. Epub 2006 Feb 24.
5
PLS dimension reduction for classification with microarray data.
Stat Appl Genet Mol Biol. 2004;3:Article33. doi: 10.2202/1544-6115.1075. Epub 2004 Nov 23.
6
Asymptotic behaviors of support vector machines with Gaussian kernel.
Neural Comput. 2003 Jul;15(7):1667-89. doi: 10.1162/089976603321891855.
7
Multi-class cancer classification via partial least squares with gene expression profiles.
Bioinformatics. 2002 Sep;18(9):1216-26. doi: 10.1093/bioinformatics/18.9.1216.
9
Tumor classification by partial least squares using microarray gene expression data.
Bioinformatics. 2002 Jan;18(1):39-50. doi: 10.1093/bioinformatics/18.1.39.
10
Prediction of central nervous system embryonal tumour outcome based on gene expression.
Nature. 2002 Jan 24;415(6870):436-42. doi: 10.1038/415436a.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验