基于微阵列的癌症分类中随机森林与支持向量机的全面比较

A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification.

作者信息

Statnikov Alexander, Wang Lily, Aliferis Constantin F

机构信息

Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA.

出版信息

BMC Bioinformatics. 2008 Jul 22;9:319. doi: 10.1186/1471-2105-9-319.

DOI:10.1186/1471-2105-9-319

PMID:18647401

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2492881/

Abstract

BACKGROUND

Cancer diagnosis and clinical outcome prediction are among the most important emerging applications of gene expression microarray technology with several molecular signatures on their way toward clinical deployment. Use of the most accurate classification algorithms available for microarray gene expression data is a critical ingredient in order to develop the best possible molecular signatures for patient care. As suggested by a large body of literature to date, support vector machines can be considered "best of class" algorithms for classification of such data. Recent work, however, suggests that random forest classifiers may outperform support vector machines in this domain.

RESULTS

In the present paper we identify methodological biases of prior work comparing random forests and support vector machines and conduct a new rigorous evaluation of the two algorithms that corrects these limitations. Our experiments use 22 diagnostic and prognostic datasets and show that support vector machines outperform random forests, often by a large margin. Our data also underlines the importance of sound research design in benchmarking and comparison of bioinformatics algorithms.

CONCLUSION

We found that both on average and in the majority of microarray datasets, random forests are outperformed by support vector machines both in the settings when no gene selection is performed and when several popular gene selection methods are used.

摘要

背景

癌症诊断和临床结果预测是基因表达微阵列技术最重要的新兴应用领域之一，有几种分子特征正朝着临床应用的方向发展。为了开发出最适合患者护理的分子特征，使用可用于微阵列基因表达数据的最准确分类算法是一个关键因素。迄今为止，大量文献表明，支持向量机可被视为用于此类数据分类的“最佳”算法。然而，最近的研究表明，在这一领域随机森林分类器可能优于支持向量机。

结果

在本文中，我们识别了先前比较随机森林和支持向量机的研究中的方法偏差，并对这两种算法进行了新的严格评估，以纠正这些局限性。我们的实验使用了22个诊断和预后数据集，结果表明支持向量机优于随机森林，而且往往优势明显。我们的数据还强调了合理的研究设计在生物信息学算法基准测试和比较中的重要性。

结论

我们发现，无论是在平均水平上，还是在大多数微阵列数据集中，在不进行基因选择以及使用几种常用基因选择方法的情况下，随机森林在性能上都不如支持向量机。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b972/2492881/8949962b1bb0/1471-2105-9-319-1.jpg

相似文献

A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification.基于微阵列的癌症分类中随机森林与支持向量机的全面比较

BMC Bioinformatics. 2008 Jul 22;9:319. doi: 10.1186/1471-2105-9-319.

Are random forests better than support vector machines for microarray-based cancer classification?对于基于微阵列的癌症分类，随机森林算法比支持向量机算法更好吗？

AMIA Annu Symp Proc. 2007 Oct 11;2007:686-90.

A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis.用于微阵列基因表达癌症诊断的多类别分类方法的综合评估。

Bioinformatics. 2005 Mar 1;21(5):631-43. doi: 10.1093/bioinformatics/bti033. Epub 2004 Sep 16.

Parallelization of multicategory support vector machines (PMC-SVM) for classifying microarray data.用于微阵列数据分类的多类别支持向量机并行化（PMC-SVM）

BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S15. doi: 10.1186/1471-2105-7-S4-S15.

Rotation of random forests for genomic and proteomic classification problems.随机森林旋转算法在基因组和蛋白质组分类问题中的应用。

Adv Exp Med Biol. 2011;696:211-21. doi: 10.1007/978-1-4419-7046-6_21.

Forest classification trees and forest support vector machines algorithms: Demonstration using microarray data.森林分类树和森林支持向量机算法：使用微阵列数据进行演示。

Comput Biol Med. 2010 May;40(5):519-24. doi: 10.1016/j.compbiomed.2010.03.006. Epub 2010 Apr 15.

Outcome prediction based on microarray analysis: a critical perspective on methods.基于微阵列分析的结果预测：对方法的批判性观点

BMC Bioinformatics. 2009 Feb 7;10:53. doi: 10.1186/1471-2105-10-53.

Large margin classifiers and Random Forests for integrated biological prediction.用于综合生物学预测的大间隔分类器和随机森林

Int J Bioinform Res Appl. 2012;8(1-2):38-53. doi: 10.1504/IJBRA.2012.045975.

Robust and accurate cancer classification with gene expression profiling.基于基因表达谱的稳健且准确的癌症分类

Proc IEEE Comput Syst Bioinform Conf. 2005:310-21. doi: 10.1109/csb.2005.49.

Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines.利用遗传算法和支持向量机相结合的方法从微阵列数据中对癌症类型进行分子分类。

FEBS Lett. 2003 Dec 4;555(2):358-62. doi: 10.1016/s0014-5793(03)01275-4.

引用本文的文献

Machine learning models for predicting the risk of depressive symptoms in Chinese college students.预测中国大学生抑郁症状风险的机器学习模型。

Front Psychiatry. 2025 Aug 5;16:1648585. doi: 10.3389/fpsyt.2025.1648585. eCollection 2025.

Machine Learning Model for Predicting Pathological Invasiveness of Pulmonary Ground-Glass Nodules Based on AI-Extracted Radiomic Features.基于人工智能提取的影像组学特征预测肺磨玻璃结节病理侵袭性的机器学习模型

Thorac Cancer. 2025 Aug;16(15):e70128. doi: 10.1111/1759-7714.70128.

Transcriptomic exploration yields novel perspectives on the regulatory network underlying trichome initiation in hypocotyl.转录组学探索为下胚轴毛状体起始的调控网络带来了新的视角。

Front Plant Sci. 2025 Jul 2;16:1604186. doi: 10.3389/fpls.2025.1604186. eCollection 2025.

Unraveling Hierarchical Brain Dysfunction in Major Depressive Disorder: A Multimodal Imaging and Transcriptomic Approach.揭示重度抑郁症中的分层脑功能障碍：一种多模态成像和转录组学方法。

Hum Brain Mapp. 2025 Jul;46(10):e70277. doi: 10.1002/hbm.70277.

Screening and Identification of Basement Membrane-Related Gene Signatures for Diagnosis in Keratoconus Through WGCNA and Machine Learning.通过加权基因共表达网络分析（WGCNA）和机器学习筛选和鉴定圆锥角膜诊断中与基底膜相关的基因特征

J Ophthalmol. 2025 Jun 1;2025:7107888. doi: 10.1155/joph/7107888. eCollection 2025.

Exploring potential diagnostic markers and therapeutic targets for type 2 diabetes mellitus with major depressive disorder through bioinformatics and in vivo experiments.通过生物信息学和体内实验探索2型糖尿病合并重度抑郁症的潜在诊断标志物和治疗靶点。

Sci Rep. 2025 May 15;15(1):16834. doi: 10.1038/s41598-025-01175-z.

Integrated single cell and bulk RNA sequencing analyses reveal the impact of tryptophan metabolism on prognosis and immunotherapy in colon cancer.整合单细胞和批量RNA测序分析揭示色氨酸代谢对结肠癌预后和免疫治疗的影响。

Sci Rep. 2025 Apr 11;15(1):12496. doi: 10.1038/s41598-025-85893-4.

Towards the Prediction of Responses to Cancer Immunotherapy: A Multi-Omics Review.迈向癌症免疫治疗反应预测：多组学综述。

Life (Basel). 2025 Feb 12;15(2):283. doi: 10.3390/life15020283.

Mental issues, internet addiction and quality of life predict burnout among Hungarian teachers: a machine learning analysis.精神问题、网络成瘾和生活质量预测匈牙利教师的倦怠：机器学习分析。

BMC Public Health. 2024 Aug 27;24(1):2322. doi: 10.1186/s12889-024-19797-9.

Deactivation and collective phasic muscular tuning for pointing direction: Insights from machine learning.指向方向的失活与集体相位肌肉调整：来自机器学习的见解

Heliyon. 2024 Jun 28;10(13):e33461. doi: 10.1016/j.heliyon.2024.e33461. eCollection 2024 Jul 15.

本文引用的文献

Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting.已发表的癌症预后微阵列研究的批判性综述以及统计分析与报告指南。

J Natl Cancer Inst. 2007 Jan 17;99(2):147-57. doi: 10.1093/jnci/djk018.

Converting a breast cancer microarray signature into a high-throughput diagnostic test.将乳腺癌基因芯片特征转化为高通量诊断测试。

BMC Genomics. 2006 Oct 30;7:278. doi: 10.1186/1471-2164-7-278.

Gene selection and classification of microarray data using random forest.使用随机森林进行微阵列数据的基因选择与分类

BMC Bioinformatics. 2006 Jan 6;7:3. doi: 10.1186/1471-2105-7-3.

GEMS: a system for automated cancer diagnosis and biomarker discovery from microarray gene expression data.GEMS：一种用于从微阵列基因表达数据中进行癌症自动诊断和生物标志物发现的系统。

Int J Med Inform. 2005 Aug;74(7-8):491-503. doi: 10.1016/j.ijmedinf.2005.05.002.

A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis.用于微阵列基因表达癌症诊断的多类别分类方法的综合评估。

Bioinformatics. 2005 Mar 1;21(5):631-43. doi: 10.1093/bioinformatics/bti033. Epub 2004 Sep 16.

Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data.使用质谱数据进行卵巢癌分类的统计方法比较

Bioinformatics. 2003 Sep 1;19(13):1636-43. doi: 10.1093/bioinformatics/btg210.

Support vector machine classification and validation of cancer tissue samples using microarray expression data.使用微阵列表达数据对癌组织样本进行支持向量机分类与验证。

Bioinformatics. 2000 Oct;16(10):906-14. doi: 10.1093/bioinformatics/16.10.906.

Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.癌症的分子分类：通过基因表达监测进行类别发现和类别预测。

Science. 1999 Oct 15;286(5439):531-7. doi: 10.1126/science.286.5439.531.

Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors.多变量预后模型：模型开发、评估假设与充分性以及测量和减少误差方面的问题。

Stat Med. 1996 Feb 28;15(4):361-87. doi: 10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于微阵列的癌症分类中随机森林与支持向量机的全面比较

A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献