Suppr超能文献

基于微阵列的癌症分类中随机森林与支持向量机的全面比较

A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification.

作者信息

Statnikov Alexander, Wang Lily, Aliferis Constantin F

机构信息

Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA.

出版信息

BMC Bioinformatics. 2008 Jul 22;9:319. doi: 10.1186/1471-2105-9-319.

Abstract

BACKGROUND

Cancer diagnosis and clinical outcome prediction are among the most important emerging applications of gene expression microarray technology with several molecular signatures on their way toward clinical deployment. Use of the most accurate classification algorithms available for microarray gene expression data is a critical ingredient in order to develop the best possible molecular signatures for patient care. As suggested by a large body of literature to date, support vector machines can be considered "best of class" algorithms for classification of such data. Recent work, however, suggests that random forest classifiers may outperform support vector machines in this domain.

RESULTS

In the present paper we identify methodological biases of prior work comparing random forests and support vector machines and conduct a new rigorous evaluation of the two algorithms that corrects these limitations. Our experiments use 22 diagnostic and prognostic datasets and show that support vector machines outperform random forests, often by a large margin. Our data also underlines the importance of sound research design in benchmarking and comparison of bioinformatics algorithms.

CONCLUSION

We found that both on average and in the majority of microarray datasets, random forests are outperformed by support vector machines both in the settings when no gene selection is performed and when several popular gene selection methods are used.

摘要

背景

癌症诊断和临床结果预测是基因表达微阵列技术最重要的新兴应用领域之一,有几种分子特征正朝着临床应用的方向发展。为了开发出最适合患者护理的分子特征,使用可用于微阵列基因表达数据的最准确分类算法是一个关键因素。迄今为止,大量文献表明,支持向量机可被视为用于此类数据分类的“最佳”算法。然而,最近的研究表明,在这一领域随机森林分类器可能优于支持向量机。

结果

在本文中,我们识别了先前比较随机森林和支持向量机的研究中的方法偏差,并对这两种算法进行了新的严格评估,以纠正这些局限性。我们的实验使用了22个诊断和预后数据集,结果表明支持向量机优于随机森林,而且往往优势明显。我们的数据还强调了合理的研究设计在生物信息学算法基准测试和比较中的重要性。

结论

我们发现,无论是在平均水平上,还是在大多数微阵列数据集中,在不进行基因选择以及使用几种常用基因选择方法的情况下,随机森林在性能上都不如支持向量机。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b972/2492881/8949962b1bb0/1471-2105-9-319-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验