在微阵列数据分析中从主成分分析（PCA）和偏最小二乘法（PLS）中选择新提取特征的子集。

Selecting subsets of newly extracted features from PCA and PLS in microarray data analysis.

作者信息

Li Guo-Zheng, Bu Hua-Long, Yang Mary Qu, Zeng Xue-Qiang, Yang Jack Y

机构信息

Department of Control Science & Engineering, Tongji University, Shanghai 201804, PR China.

出版信息

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S24. doi: 10.1186/1471-2164-9-S2-S24.

DOI:10.1186/1471-2164-9-S2-S24

PMID:18831790

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2559889/

Abstract

BACKGROUND

Dimension reduction is a critical issue in the analysis of microarray data, because the high dimensionality of gene expression microarray data set hurts generalization performance of classifiers. It consists of two types of methods, i.e. feature selection and feature extraction. Principle component analysis (PCA) and partial least squares (PLS) are two frequently used feature extraction methods, and in the previous works, the top several components of PCA or PLS are selected for modeling according to the descending order of eigenvalues. While in this paper, we prove that not all the top features are useful, but features should be selected from all the components by feature selection methods.

RESULTS

We demonstrate a framework for selecting feature subsets from all the newly extracted components, leading to reduced classification error rates on the gene expression microarray data. Here we have considered both an unsupervised method PCA and a supervised method PLS for extracting new components, genetic algorithms for feature selection, and support vector machines and k nearest neighbor for classification. Experimental results illustrate that our proposed framework is effective to select feature subsets and to reduce classification error rates.

CONCLUSION

Not only the top features newly extracted by PCA or PLS are important, therefore, feature selection should be performed to select subsets from new features to improve generalization performance of classifiers.

摘要

背景

在微阵列数据分析中，降维是一个关键问题，因为基因表达微阵列数据集的高维度会损害分类器的泛化性能。它由两种类型的方法组成，即特征选择和特征提取。主成分分析（PCA）和偏最小二乘法（PLS）是两种常用的特征提取方法，在以往的工作中，根据特征值的降序选择PCA或PLS的前几个成分进行建模。而在本文中，我们证明并非所有的顶级特征都是有用的，而应该通过特征选择方法从所有成分中选择特征。

结果

我们展示了一个从所有新提取的成分中选择特征子集的框架，从而降低了基因表达微阵列数据的分类错误率。这里我们考虑了用于提取新成分的无监督方法PCA和有监督方法PLS、用于特征选择的遗传算法以及用于分类的支持向量机和k近邻算法。实验结果表明，我们提出的框架对于选择特征子集和降低分类错误率是有效的。

结论

因此，不仅PCA或PLS新提取的顶级特征很重要，还应该进行特征选择以从新特征中选择子集，从而提高分类器的泛化性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7726/2559889/266096fec497/1471-2164-9-S2-S24-1.jpg

相似文献

Selecting subsets of newly extracted features from PCA and PLS in microarray data analysis.在微阵列数据分析中从主成分分析（PCA）和偏最小二乘法（PLS）中选择新提取特征的子集。

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S24. doi: 10.1186/1471-2164-9-S2-S24.

Chaotic genetic algorithm for gene selection and classification problems.用于基因选择与分类问题的混沌遗传算法。

OMICS. 2009 Oct;13(5):407-20. doi: 10.1089/omi.2009.0007.

Regularized Least Squares Cancer classifiers from DNA microarray data.基于DNA微阵列数据的正则化最小二乘癌症分类器。

BMC Bioinformatics. 2005 Dec 1;6 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-6-S4-S2.

Partial least squares dimension reduction for microarray gene expression data with a censored response.具有删失响应的微阵列基因表达数据的偏最小二乘降维法

Math Biosci. 2005 Jan;193(1):119-37. doi: 10.1016/j.mbs.2004.10.007. Epub 2005 Jan 22.

Improving PLS-RFE based gene selection for microarray data classification.改进基于偏最小二乘回归特征消除法的基因选择用于微阵列数据分类

Comput Biol Med. 2015 Jul;62:14-24. doi: 10.1016/j.compbiomed.2015.04.011. Epub 2015 Apr 17.

Dimension reduction for classification with gene expression microarray data.利用基因表达微阵列数据进行分类的降维方法。

Stat Appl Genet Mol Biol. 2006;5:Article6. doi: 10.2202/1544-6115.1147. Epub 2006 Feb 24.

Nonnegative principal component analysis for cancer molecular pattern discovery.基于非负主成分分析的癌症分子模式发现。

IEEE/ACM Trans Comput Biol Bioinform. 2010 Jul-Sep;7(3):537-49. doi: 10.1109/TCBB.2009.36.

Applications of support vector machines to cancer classification with microarray data.支持向量机在利用微阵列数据进行癌症分类中的应用。

Int J Neural Syst. 2005 Dec;15(6):475-84. doi: 10.1142/S0129065705000396.

A novel feature selection approach for biomedical data classification.一种用于生物医学数据分类的新特征选择方法。

J Biomed Inform. 2010 Feb;43(1):15-23. doi: 10.1016/j.jbi.2009.07.008. Epub 2009 Jul 30.

Feature selection and nearest centroid classification for protein mass spectrometry.蛋白质质谱的特征选择与最近质心分类

BMC Bioinformatics. 2005 Mar 23;6:68. doi: 10.1186/1471-2105-6-68.

引用本文的文献

Identification of Parkinson's disease using MRI and genetic data from the PPMI cohort: an improved machine learning fusion approach.利用帕金森病标志物倡议（PPMI）队列的MRI和基因数据识别帕金森病：一种改进的机器学习融合方法。

Front Aging Neurosci. 2025 Feb 4;17:1510192. doi: 10.3389/fnagi.2025.1510192. eCollection 2025.

Data Integration-Possibilities of Molecular and Clinical Data Fusion on the Example of Thyroid Cancer Diagnostics.数据集成——以甲状腺癌诊断为例的分子与临床数据融合的可能性。

Int J Mol Sci. 2022 Oct 6;23(19):11880. doi: 10.3390/ijms231911880.

RNA-Seq of Human Neural Progenitor Cells Exposed to Lead (Pb) Reveals Transcriptome Dynamics, Splicing Alterations and Disease Risk Associations.人类神经祖细胞暴露于铅（Pb）的 RNA-Seq 揭示了转录组动态、剪接改变和疾病风险关联。

Toxicol Sci. 2017 Sep 1;159(1):251-265. doi: 10.1093/toxsci/kfx129.

Effective Feature Selection for Classification of Promoter Sequences.用于启动子序列分类的有效特征选择

PLoS One. 2016 Dec 15;11(12):e0167165. doi: 10.1371/journal.pone.0167165. eCollection 2016.

Pulse Diagnosis Signals Analysis of Fatty Liver Disease and Cirrhosis Patients by Using Machine Learning.基于机器学习的脂肪肝和肝硬化患者脉象诊断信号分析

ScientificWorldJournal. 2015;2015:859192. doi: 10.1155/2015/859192. Epub 2015 Nov 28.

An improved independent component analysis model for 3D chromatogram separation and its solution by multi-areas genetic algorithm.一种改进的独立成分分析模型，用于三维色谱分离及其通过多区域遗传算法的求解。

BMC Bioinformatics. 2014;15 Suppl 12(Suppl 12):S8. doi: 10.1186/1471-2105-15-S12-S8. Epub 2014 Nov 6.

Applications of Bayesian gene selection and classification with mixtures of generalized singular g-priors.贝叶斯基因选择和分类在广义奇异 g-先验混合中的应用。

Comput Math Methods Med. 2013;2013:420412. doi: 10.1155/2013/420412. Epub 2013 Dec 8.

Patterns Prediction of Chemotherapy Sensitivity in Cancer Cell lines Using FTIR Spectrum, Neural Network and Principal Components Analysis.使用傅里叶变换红外光谱、神经网络和主成分分析预测癌细胞系化疗敏感性的模式

Iran J Pharm Res. 2012 Spring;11(2):401-10.

Cisplatin Resistant Patterns in Ovarian Cell Line Using FTIR and Principle Component Analysis.使用傅里叶变换红外光谱法和主成分分析法研究卵巢癌细胞系中的顺铂耐药模式

Iran J Pharm Res. 2012 Winter;11(1):235-40.

A glance at DNA microarray technology and applications.DNA 微阵列技术及其应用一瞥。

Bioimpacts. 2011;1(2):75-86. doi: 10.5681/bi.2011.011. Epub 2011 Aug 4.

本文引用的文献

Dimension reduction with redundant gene elimination for tumor classification.用于肿瘤分类的冗余基因消除降维方法

BMC Bioinformatics. 2008 May 28;9 Suppl 6(Suppl 6):S8. doi: 10.1186/1471-2105-9-S6-S8.

Asymmetric bagging and feature selection for activities prediction of drug molecules.用于药物分子活性预测的非对称装袋法和特征选择

BMC Bioinformatics. 2008 May 28;9 Suppl 6(Suppl 6):S7. doi: 10.1186/1471-2105-9-S6-S7.

Partial least squares: a versatile tool for the analysis of high-dimensional genomic data.偏最小二乘法：一种用于分析高维基因组数据的通用工具。

Brief Bioinform. 2007 Jan;8(1):32-44. doi: 10.1093/bib/bbl016. Epub 2006 May 26.

Dimension reduction for classification with gene expression microarray data.利用基因表达微阵列数据进行分类的降维方法。

Stat Appl Genet Mol Biol. 2006;5:Article6. doi: 10.2202/1544-6115.1147. Epub 2006 Feb 24.

PLS dimension reduction for classification with microarray data.用于微阵列数据分类的偏最小二乘降维法

Stat Appl Genet Mol Biol. 2004;3:Article33. doi: 10.2202/1544-6115.1075. Epub 2004 Nov 23.

Asymptotic behaviors of support vector machines with Gaussian kernel.具有高斯核的支持向量机的渐近行为

Neural Comput. 2003 Jul;15(7):1667-89. doi: 10.1162/089976603321891855.

Multi-class cancer classification via partial least squares with gene expression profiles.基于基因表达谱的偏最小二乘法进行多类别癌症分类

Bioinformatics. 2002 Sep;18(9):1216-26. doi: 10.1093/bioinformatics/18.9.1216.

Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma.利用肺癌和间皮瘤中的基因表达比率将微阵列数据转化为具有临床相关性的癌症诊断测试。

Cancer Res. 2002 Sep 1;62(17):4963-7.

Tumor classification by partial least squares using microarray gene expression data.利用微阵列基因表达数据通过偏最小二乘法进行肿瘤分类。

Bioinformatics. 2002 Jan;18(1):39-50. doi: 10.1093/bioinformatics/18.1.39.

Prediction of central nervous system embryonal tumour outcome based on gene expression.基于基因表达的中枢神经系统胚胎性肿瘤预后预测

Nature. 2002 Jan 24;415(6870):436-42. doi: 10.1038/415436a.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

在微阵列数据分析中从主成分分析（PCA）和偏最小二乘法（PLS）中选择新提取特征的子集。

Selecting subsets of newly extracted features from PCA and PLS in microarray data analysis.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献