微阵列数据分析中的稳定基因选择。

A stable gene selection in microarray data analysis.

作者信息

Yang Kun, Cai Zhipeng, Li Jianzhong, Lin Guohui

机构信息

Department of Computer Science and Engineering, Harbin Institute of Technology, Harbin 150001, China.

出版信息

BMC Bioinformatics. 2006 Apr 27;7:228. doi: 10.1186/1471-2105-7-228.

DOI:10.1186/1471-2105-7-228

PMID:16643657

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1524991/

Abstract

BACKGROUND

Microarray data analysis is notorious for involving a huge number of genes compared to a relatively small number of samples. Gene selection is to detect the most significantly differentially expressed genes under different conditions, and it has been a central research focus. In general, a better gene selection method can improve the performance of classification significantly. One of the difficulties in gene selection is that the numbers of samples under different conditions vary a lot.

RESULTS

Two novel gene selection methods are proposed in this paper, which are not affected by the unbalanced sample class sizes and do not assume any explicit statistical model on the gene expression values. They were evaluated on eight publicly available microarray datasets, using leave-one-out cross-validation and 5-fold cross-validation. The performance is measured by the classification accuracies using the top ranked genes based on the training datasets.

CONCLUSION

The experimental results showed that the proposed gene selection methods are efficient, effective, and robust in identifying differentially expressed genes. Adopting the existing SVM-based and KNN-based classifiers, the selected genes by our proposed methods in general give more accurate classification results, typically when the sample class sizes in the training dataset are unbalanced.

摘要

背景

与相对较少的样本数量相比，微阵列数据分析因涉及大量基因而声名狼藉。基因选择旨在检测不同条件下差异表达最显著的基因，它一直是核心研究重点。一般来说，更好的基因选择方法能显著提高分类性能。基因选择的困难之一在于不同条件下的样本数量差异很大。

结果

本文提出了两种新颖的基因选择方法，它们不受样本类别大小不平衡的影响，且不对基因表达值假设任何显式统计模型。使用留一法交叉验证和五折交叉验证，在八个公开可用的微阵列数据集上对它们进行了评估。性能通过基于训练数据集使用排名靠前的基因的分类准确率来衡量。

结论

实验结果表明，所提出的基因选择方法在识别差异表达基因方面高效、有效且稳健。采用现有的基于支持向量机和基于K近邻的分类器，我们提出的方法选择的基因通常能给出更准确的分类结果，特别是当训练数据集中的样本类别大小不平衡时。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5050/1524991/c48d218ab4fa/1471-2105-7-228-1.jpg

相似文献

A stable gene selection in microarray data analysis.

BMC Bioinformatics. 2006 Apr 27;7:228. doi: 10.1186/1471-2105-7-228.

A unified framework for finding differentially expressed genes from microarray experiments.

BMC Bioinformatics. 2007 Sep 18;8:347. doi: 10.1186/1471-2105-8-347.

Gene selection algorithms for microarray data based on least squares support vector machine.

BMC Bioinformatics. 2006 Feb 27;7:95. doi: 10.1186/1471-2105-7-95.

Supervised group Lasso with applications to microarray data analysis.

BMC Bioinformatics. 2007 Feb 22;8:60. doi: 10.1186/1471-2105-8-60.

Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes.

BMC Bioinformatics. 2006 Aug 31;7:397. doi: 10.1186/1471-2105-7-397.

Subdimension-based similarity measure for DNA microarray data clustering.

Phys Rev E Stat Nonlin Soft Matter Phys. 2006 Oct;74(4 Pt 1):041906. doi: 10.1103/PhysRevE.74.041906. Epub 2006 Oct 9.

MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data.

Bioinformatics. 2007 May 1;23(9):1106-14. doi: 10.1093/bioinformatics/btm036.

Recursive gene selection based on maximum margin criterion: a comparison with SVM-RFE.

BMC Bioinformatics. 2006 Dec 25;7:543. doi: 10.1186/1471-2105-7-543.

A comparative study of different machine learning methods on microarray gene expression data.

BMC Genomics. 2008;9 Suppl 1(Suppl 1):S13. doi: 10.1186/1471-2164-9-S1-S13.

A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis.

Bioinformatics. 2005 Mar 1;21(5):631-43. doi: 10.1093/bioinformatics/bti033. Epub 2004 Sep 16.

引用本文的文献

3D genome contributes to MHC-II neoantigen prediction.

BMC Genomics. 2024 Sep 26;25(Suppl 2):889. doi: 10.1186/s12864-024-10687-3.

Efficient Selection of Gaussian Kernel SVM Parameters for Imbalanced Data.

Genes (Basel). 2023 Feb 25;14(3):583. doi: 10.3390/genes14030583.

Discovering key transcriptomic regulators in pancreatic ductal adenocarcinoma using Dirichlet process Gaussian mixture model.

Sci Rep. 2021 Apr 12;11(1):7853. doi: 10.1038/s41598-021-87234-7.

A Sparse-Modeling Based Approach for Class Specific Feature Selection.

PeerJ Comput Sci. 2019 Nov 18;5:e237. doi: 10.7717/peerj-cs.237. eCollection 2019.

A novel neoantigen discovery approach based on chromatin high order conformation.

BMC Med Genomics. 2020 Aug 27;13(Suppl 6):62. doi: 10.1186/s12920-020-0708-z.

Hellinger distance-based stable sparse feature selection for high-dimensional class-imbalanced data.

BMC Bioinformatics. 2020 Mar 23;21(1):121. doi: 10.1186/s12859-020-3411-3.

A Protein Interaction Information-based Generative Model for Enhancing Gene Clustering.

Sci Rep. 2020 Jan 20;10(1):665. doi: 10.1038/s41598-020-57437-5.

ZCMM: A Novel Method Using Z-Curve Theory- Based and Position Weight Matrix for Predicting Nucleosome Positioning.

Genes (Basel). 2019 Sep 28;10(10):765. doi: 10.3390/genes10100765.

A hybrid gene selection method based on gene scoring strategy and improved particle swarm optimization.

BMC Bioinformatics. 2019 Jun 10;20(Suppl 8):289. doi: 10.1186/s12859-019-2773-x.

Cancer type prediction based on copy number aberration and chromatin 3D structure with convolutional neural networks.

BMC Genomics. 2018 Aug 13;19(Suppl 6):565. doi: 10.1186/s12864-018-4919-z.

本文引用的文献

A theoretical analysis of gene selection.

Proc IEEE Comput Syst Bioinform Conf. 2004:131-41.

Minimum redundancy feature selection from microarray gene expression data.

J Bioinform Comput Biol. 2005 Apr;3(2):185-205. doi: 10.1142/s0219720005001004.

A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis.

Bioinformatics. 2005 Mar 1;21(5):631-43. doi: 10.1093/bioinformatics/bti033. Epub 2004 Sep 16.

New gene selection method for classification of cancer subtypes considering within-class variation.

FEBS Lett. 2003 Sep 11;551(1-3):3-7. doi: 10.1016/s0014-5793(03)00819-6.

Gene expression-based classification of malignant gliomas correlates better with survival than histological classification.

Cancer Res. 2003 Apr 1;63(7):1602-7.

Nonparametric methods for identifying differentially expressed genes in microarray data.

Bioinformatics. 2002 Nov;18(11):1454-61. doi: 10.1093/bioinformatics/18.11.1454.

Gene expression correlates of clinical prostate cancer behavior.

Cancer Cell. 2002 Mar;1(2):203-9. doi: 10.1016/s1535-6108(02)00030-2.

Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning.

Nat Med. 2002 Jan;8(1):68-74. doi: 10.1038/nm0102-68.

MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia.

Nat Genet. 2002 Jan;30(1):41-7. doi: 10.1038/ng765. Epub 2001 Dec 3.

Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses.

Proc Natl Acad Sci U S A. 2001 Nov 20;98(24):13790-5. doi: 10.1073/pnas.191502998. Epub 2001 Nov 13.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

微阵列数据分析中的稳定基因选择。

A stable gene selection in microarray data analysis.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献