Slawski M, Daumer M, Boulesteix A-L
Sylvia Lawry Centre for Multiple Sclerosis Research, Munich, Germany.
BMC Bioinformatics. 2008 Oct 16;9:439. doi: 10.1186/1471-2105-9-439.
For the last eight years, microarray-based classification has been a major topic in statistics, bioinformatics and biomedicine research. Traditional methods often yield unsatisfactory results or may even be inapplicable in the so-called "p >> n" setting where the number of predictors p by far exceeds the number of observations n, hence the term "ill-posed-problem". Careful model selection and evaluation satisfying accepted good-practice standards is a very complex task for statisticians without experience in this area or for scientists with limited statistical background. The multiplicity of available methods for class prediction based on high-dimensional data is an additional practical challenge for inexperienced researchers.
In this article, we introduce a new Bioconductor package called CMA (standing for "Classification for MicroArrays") for automatically performing variable selection, parameter tuning, classifier construction, and unbiased evaluation of the constructed classifiers using a large number of usual methods. Without much time and effort, users are provided with an overview of the unbiased accuracy of most top-performing classifiers. Furthermore, the standardized evaluation framework underlying CMA can also be beneficial in statistical research for comparison purposes, for instance if a new classifier has to be compared to existing approaches.
CMA is a user-friendly comprehensive package for classifier construction and evaluation implementing most usual approaches. It is freely available from the Bioconductor website at (http://bioconductor.org/packages/2.3/bioc/html/CMA.html).
在过去八年中,基于微阵列的分类一直是统计学、生物信息学和生物医学研究中的一个主要课题。传统方法往往产生不尽人意的结果,甚至在所谓的“p >> n”情况下可能不适用,即预测变量的数量p远远超过观测值的数量n,因此有“不适定问题”这一术语。对于没有该领域经验的统计学家或统计背景有限的科学家而言,按照公认的良好实践标准进行仔细的模型选择和评估是一项非常复杂的任务。基于高维数据的类预测可用方法众多,这对缺乏经验的研究人员来说是另一个实际挑战。
在本文中,我们介绍了一个名为CMA(代表“微阵列分类”)的新Bioconductor软件包,它可以使用大量常用方法自动执行变量选择、参数调整、分类器构建以及对构建的分类器进行无偏评估。无需花费太多时间和精力,就能为用户提供大多数表现最佳的分类器的无偏准确性概述。此外,CMA所基于的标准化评估框架在统计研究中用于比较目的时也可能很有用,例如,如果要将新的分类器与现有方法进行比较。
CMA是一个用户友好的综合软件包,用于分类器构建和评估,实现了大多数常用方法。它可从Bioconductor网站(http://bioconductor.org/packages/2.3/bioc/html/CMA.html)免费获取。