Ancona Nicola, Maglietta Rosalia, D'Addabbo Annarita, Liuni Sabino, Pesole Graziano
Istituto di Studi sui Sistemi Intelligenti per I'Automazione, CNR, Via Amendola 122/D-I, 70126 Bari, Italy.
BMC Bioinformatics. 2005 Dec 1;6 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-6-S4-S2.
The advent of the technology of DNA microarrays constitutes an epochal change in the classification and discovery of different types of cancer because the information provided by DNA microarrays allows an approach to the problem of cancer analysis from a quantitative rather than qualitative point of view. Cancer classification requires well founded mathematical methods which are able to predict the status of new specimens with high significance levels starting from a limited number of data. In this paper we assess the performances of Regularized Least Squares (RLS) classifiers, originally proposed in regularization theory, by comparing them with Support Vector Machines (SVM), the state-of-the-art supervised learning technique for cancer classification by DNA microarray data. The performances of both approaches have been also investigated with respect to the number of selected genes and different gene selection strategies.
We show that RLS classifiers have performances comparable to those of SVM classifiers as the Leave-One-Out (LOO) error evaluated on three different data sets shows. The main advantage of RLS machines is that for solving a classification problem they use a linear system of order equal to either the number of features or the number of training examples. Moreover, RLS machines allow to get an exact measure of the LOO error with just one training.
RLS classifiers are a valuable alternative to SVM classifiers for the problem of cancer classification by gene expression data, due to their simplicity and low computational complexity. Moreover, RLS classifiers show generalization ability comparable to the ones of SVM classifiers also in the case the classification of new specimens involves very few gene expression levels.
DNA微阵列技术的出现为不同类型癌症的分类和发现带来了划时代的变革,因为DNA微阵列提供的信息使得我们能够从定量而非定性的角度来解决癌症分析问题。癌症分类需要有坚实数学基础的方法,这些方法能够从有限的数据出发,以高显著水平预测新样本的状态。在本文中,我们通过将正则化最小二乘(RLS)分类器与支持向量机(SVM)(用于通过DNA微阵列数据进行癌症分类的最先进监督学习技术)进行比较,来评估最初在正则化理论中提出的RLS分类器的性能。还针对所选基因的数量和不同的基因选择策略研究了这两种方法的性能。
正如在三个不同数据集上评估的留一法(LOO)误差所示,我们表明RLS分类器的性能与SVM分类器相当。RLS机器的主要优点在于,为了解决分类问题,它们使用的线性系统的阶数等于特征数量或训练示例数量。此外,RLS机器只需一次训练就能得到LOO误差的精确度量。
由于其简单性和低计算复杂度,对于通过基因表达数据进行癌症分类的问题,RLS分类器是SVM分类器的一个有价值的替代方案。此外,在新样本的分类涉及很少基因表达水平的情况下,RLS分类器也表现出与SVM分类器相当的泛化能力。