不同机器学习方法对微阵列基因表达数据的比较研究。

A comparative study of different machine learning methods on microarray gene expression data.

作者信息

Pirooznia Mehdi, Yang Jack Y, Yang Mary Qu, Deng Youping

机构信息

Department of Biological Sciences, University of Southern Mississippi, Hattiesburg 39406, USA.

出版信息

BMC Genomics. 2008;9 Suppl 1(Suppl 1):S13. doi: 10.1186/1471-2164-9-S1-S13.

DOI:10.1186/1471-2164-9-S1-S13

PMID:18366602

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2386055/

Abstract

BACKGROUND

Several classification and feature selection methods have been studied for the identification of differentially expressed genes in microarray data. Classification methods such as SVM, RBF Neural Nets, MLP Neural Nets, Bayesian, Decision Tree and Random Forrest methods have been used in recent studies. The accuracy of these methods has been calculated with validation methods such as v-fold validation. However there is lack of comparison between these methods to find a better framework for classification, clustering and analysis of microarray gene expression results.

RESULTS

In this study, we compared the efficiency of the classification methods including; SVM, RBF Neural Nets, MLP Neural Nets, Bayesian, Decision Tree and Random Forrest methods. The v-fold cross validation was used to calculate the accuracy of the classifiers. Some of the common clustering methods including K-means, DBC, and EM clustering were applied to the datasets and the efficiency of these methods have been analysed. Further the efficiency of the feature selection methods including support vector machine recursive feature elimination (SVM-RFE), Chi Squared, and CSF were compared. In each case these methods were applied to eight different binary (two class) microarray datasets. We evaluated the class prediction efficiency of each gene list in training and test cross-validation using supervised classifiers.

CONCLUSIONS

We presented a study in which we compared some of the common used classification, clustering, and feature selection methods. We applied these methods to eight publicly available datasets, and compared how these methods performed in class prediction of test datasets. We reported that the choice of feature selection methods, the number of genes in the gene list, the number of cases (samples) substantially influence classification success. Based on features chosen by these methods, error rates and accuracy of several classification algorithms were obtained. Results revealed the importance of feature selection in accurately classifying new samples and how an integrated feature selection and classification algorithm is performing and is capable of identifying significant genes.

摘要

背景

为了识别微阵列数据中差异表达的基因，人们研究了多种分类和特征选择方法。近期研究中使用了支持向量机（SVM）、径向基函数神经网络（RBF Neural Nets）、多层感知器神经网络（MLP Neural Nets）、贝叶斯、决策树和随机森林等分类方法。这些方法的准确性已通过诸如v折交叉验证等验证方法进行计算。然而，在这些方法之间缺乏比较，以找到一个更好的微阵列基因表达结果分类、聚类和分析框架。

结果

在本研究中，我们比较了包括支持向量机、径向基函数神经网络、多层感知器神经网络、贝叶斯、决策树和随机森林方法在内的分类方法的效率。使用v折交叉验证来计算分类器的准确性。一些常见的聚类方法，包括K均值、密度峰值聚类（DBC）和期望最大化（EM）聚类，被应用于数据集，并分析了这些方法的效率。此外，还比较了包括支持向量机递归特征消除（SVM-RFE）、卡方检验和脑脊液特征选择方法（CSF）在内的特征选择方法的效率。在每种情况下，这些方法都应用于八个不同的二元（两类）微阵列数据集。我们使用监督分类器评估了训练和测试交叉验证中每个基因列表的类预测效率。

结论

我们进行了一项研究，比较了一些常用的分类、聚类和特征选择方法。我们将这些方法应用于八个公开可用的数据集，并比较了这些方法在测试数据集的类预测中的表现。我们报告说，特征选择方法的选择、基因列表中的基因数量、样本数量对分类成功有重大影响。基于这些方法选择的特征，获得了几种分类算法的错误率和准确性。结果揭示了特征选择在准确分类新样本中的重要性，以及集成特征选择和分类算法的性能，以及它能够识别重要基因的能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc76/2386055/6d4c07c480d8/1471-2164-9-S1-S13-1.jpg

相似文献

A comparative study of different machine learning methods on microarray gene expression data.不同机器学习方法对微阵列基因表达数据的比较研究。

BMC Genomics. 2008;9 Suppl 1(Suppl 1):S13. doi: 10.1186/1471-2164-9-S1-S13.

Recursive cluster elimination (RCE) for classification and feature selection from gene expression data.用于从基因表达数据中进行分类和特征选择的递归聚类消除法（RCE）

BMC Bioinformatics. 2007 May 2;8:144. doi: 10.1186/1471-2105-8-144.

Recursive gene selection based on maximum margin criterion: a comparison with SVM-RFE.基于最大间隔准则的递归基因选择：与支持向量机递归特征消除法的比较

BMC Bioinformatics. 2006 Dec 25;7:543. doi: 10.1186/1471-2105-7-543.

Development of two-stage SVM-RFE gene selection strategy for microarray expression data analysis.用于微阵列表达数据分析的两阶段支持向量机-递归特征消除基因选择策略的开发。

IEEE/ACM Trans Comput Biol Bioinform. 2007 Jul-Sep;4(3):365-81. doi: 10.1109/TCBB.2007.70224.

Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data.从微阵列数据生成差异表达基因列表的方法的比较与评估

BMC Bioinformatics. 2006 Jul 26;7:359. doi: 10.1186/1471-2105-7-359.

MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data.MSVM-RFE：用于DNA微阵列数据多类基因选择的SVM-RFE扩展方法

Bioinformatics. 2007 May 1;23(9):1106-14. doi: 10.1093/bioinformatics/btm036.

Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction.机器学习中特征选择的最佳评分对及其在癌症预后预测中的应用。

BMC Bioinformatics. 2011 Sep 23;12:375. doi: 10.1186/1471-2105-12-375.

Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data.用于质谱和微阵列数据的递归支持向量机特征选择与样本分类

BMC Bioinformatics. 2006 Apr 10;7:197. doi: 10.1186/1471-2105-7-197.

Prediction potential of candidate biomarker sets identified and validated on gene expression data from multiple datasets.在来自多个数据集的基因表达数据上鉴定和验证的候选生物标志物集的预测潜力。

BMC Bioinformatics. 2007 Oct 26;8:415. doi: 10.1186/1471-2105-8-415.

Enhancing the prediction of IDC breast cancer staging from gene expression profiles using hybrid feature selection methods and deep learning architecture.使用混合特征选择方法和深度学习架构增强从基因表达谱预测浸润性导管癌乳腺癌分期的能力。

Med Biol Eng Comput. 2023 Nov;61(11):2895-2919. doi: 10.1007/s11517-023-02892-1. Epub 2023 Aug 2.

引用本文的文献

EsoDetect: computational validation and algorithm development of a novel diagnostic and prognostic tool for dysplasia in Barrett's esophagus.EsoDetect：一种用于巴雷特食管发育异常的新型诊断和预后工具的计算验证与算法开发

PeerJ. 2025 Jul 3;13:e19613. doi: 10.7717/peerj.19613. eCollection 2025.

Detection of flap malperfusion after microsurgical tissue reconstruction using hyperspectral imaging and machine learning.使用高光谱成像和机器学习检测显微外科组织重建后的皮瓣灌注不良

Sci Rep. 2025 May 5;15(1):15637. doi: 10.1038/s41598-025-98874-4.

Innovative approach towards early prediction of ovarian cancer: Machine learning- enabled XAI techniques.卵巢癌早期预测的创新方法：基于机器学习的可解释人工智能技术

Heliyon. 2024 Apr 15;10(9):e29197. doi: 10.1016/j.heliyon.2024.e29197. eCollection 2024 May 15.

Cancer genetics and deep learning applications for diagnosis, prognosis, and categorization.癌症遗传学与深度学习在诊断、预后及分类中的应用。

J Biol Methods. 2024 Aug 9;11(3):e99010017. doi: 10.14440/jbm.2024.0016. eCollection 2024.

Comparing machine learning screening approaches using clinical data and cytokine profiles for COVID-19 in resource-limited and resource-abundant settings.比较资源有限和资源丰富环境下使用临床数据和细胞因子谱进行 COVID-19 机器学习筛查方法。

Sci Rep. 2024 Jun 28;14(1):14892. doi: 10.1038/s41598-024-63707-3.

Pathway-based analyses of gene expression profiles at low doses of ionizing radiation.低剂量电离辐射下基因表达谱的基于通路的分析。

Front Bioinform. 2024 May 14;4:1280971. doi: 10.3389/fbinf.2024.1280971. eCollection 2024.

A Machine Learning Approach to Simulate Gene Expression and Infer Gene Regulatory Networks.一种用于模拟基因表达和推断基因调控网络的机器学习方法。

Entropy (Basel). 2023 Aug 15;25(8):1214. doi: 10.3390/e25081214.

On the challenges of predicting treatment response in Hodgkin's Lymphoma using transcriptomic data.基于转录组数据预测霍奇金淋巴瘤治疗反应的挑战。

BMC Med Genomics. 2023 Jul 20;16(Suppl 1):170. doi: 10.1186/s12920-023-01508-9.

Classification of porcine reproductive and respiratory syndrome clinical impact in Ontario sow herds using machine learning approaches.使用机器学习方法对安大略省母猪群中猪繁殖与呼吸综合征的临床影响进行分类。

Front Vet Sci. 2023 Jun 7;10:1175569. doi: 10.3389/fvets.2023.1175569. eCollection 2023.

Utilization of Computer Classification Methods for Exposure Prediction and Gene Selection in Toxicogenomics.利用计算机分类方法进行毒理基因组学中的暴露预测和基因选择

Biology (Basel). 2023 May 9;12(5):692. doi: 10.3390/biology12050692.

本文引用的文献

SVM Classifier - a comprehensive java interface for support vector machine classification of microarray data.支持向量机分类器——用于对微阵列数据进行支持向量机分类的全面Java接口。

BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S25. doi: 10.1186/1471-2105-7-S4-S25.

[Genetic regulatory pathway of gene related breast cancer metastasis: primary study by linear differential model and k-means clustering].[与乳腺癌转移相关基因的遗传调控通路：线性微分模型和k均值聚类的初步研究]

Zhonghua Yi Xue Za Zhi. 2006 Jul 11;86(26):1808-12.

Identification of critical genes in microarray experiments by a Neuro-Fuzzy approach.基于神经模糊方法的微阵列实验关键基因识别

Comput Biol Chem. 2006 Oct;30(5):372-81. doi: 10.1016/j.compbiolchem.2006.08.004. Epub 2006 Sep 20.

Applying dynamic Bayesian networks to perturbed gene expression data.将动态贝叶斯网络应用于受干扰的基因表达数据。

BMC Bioinformatics. 2006 May 8;7:249. doi: 10.1186/1471-2105-7-249.

Gene selection and classification of microarray data using random forest.使用随机森林进行微阵列数据的基因选择与分类

BMC Bioinformatics. 2006 Jan 6;7:3. doi: 10.1186/1471-2105-7-3.

caGEDA: a web application for the integrated analysis of global gene expression patterns in cancer.caGEDA：一个用于癌症中全球基因表达模式综合分析的网络应用程序。

Appl Bioinformatics. 2004;3(1):49-62. doi: 10.2165/00822942-200403010-00007.

Multiple SVM-RFE for gene selection in cancer classification with expression data.用于基于表达数据的癌症分类中基因选择的多重支持向量机递归特征消除法

IEEE Trans Nanobioscience. 2005 Sep;4(3):228-34. doi: 10.1109/tnb.2005.853657.

Feature selection and classification for microarray data analysis: evolutionary methods for identifying predictive genes.用于微阵列数据分析的特征选择与分类：识别预测基因的进化方法

BMC Bioinformatics. 2005 Jun 15;6:148. doi: 10.1186/1471-2105-6-148.

Gene selection from microarray data for cancer classification--a machine learning approach.基于机器学习方法从微阵列数据中进行癌症分类的基因选择

Comput Biol Chem. 2005 Feb;29(1):37-46. doi: 10.1016/j.compbiolchem.2004.11.001.

Diagnostic classification of cancer using DNA microarrays and artificial intelligence.使用DNA微阵列和人工智能对癌症进行诊断分类。

Ann N Y Acad Sci. 2004 May;1020:49-66. doi: 10.1196/annals.1310.007.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

不同机器学习方法对微阵列基因表达数据的比较研究。

A comparative study of different machine learning methods on microarray gene expression data.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献