Hang Xiyi, Wu Fang-Xiang
Department of Electrical and Computer Engineering, California State University, Northridge, CA 91330, USA.
J Biomed Biotechnol. 2009;2009:403689. doi: 10.1155/2009/403689. Epub 2009 Mar 15.
Personalized drug design requires the classification of cancer patients as accurate as possible. With advances in genome sequencing and microarray technology, a large amount of gene expression data has been and will continuously be produced from various cancerous patients. Such cancer-alerted gene expression data allows us to classify tumors at the genomewide level. However, cancer-alerted gene expression datasets typically have much more number of genes (features) than that of samples (patients), which imposes a challenge for classification of tumors. In this paper, a new method is proposed for cancer diagnosis using gene expression data by casting the classification problem as finding sparse representations of test samples with respect to training samples. The sparse representation is computed by the l(1)-regularized least square method. To investigate its performance, the proposed method is applied to six tumor gene expression datasets and compared with various support vector machine (SVM) methods. The experimental results have shown that the performance of the proposed method is comparable with or better than those of SVMs. In addition, the proposed method is more efficient than SVMs as it has no need of model selection.
个性化药物设计需要尽可能准确地对癌症患者进行分类。随着基因组测序和微阵列技术的进步,已经并将继续从各种癌症患者中产生大量基因表达数据。这种癌症相关的基因表达数据使我们能够在全基因组水平上对肿瘤进行分类。然而,癌症相关的基因表达数据集通常具有比样本(患者)多得多的基因(特征)数量,这给肿瘤分类带来了挑战。在本文中,提出了一种利用基因表达数据进行癌症诊断的新方法,即将分类问题转化为寻找测试样本相对于训练样本的稀疏表示。稀疏表示通过l(1)正则化最小二乘法计算。为了研究其性能,将所提出的方法应用于六个肿瘤基因表达数据集,并与各种支持向量机(SVM)方法进行比较。实验结果表明,所提出方法的性能与支持向量机相当或更好。此外,所提出的方法比支持向量机更高效,因为它不需要模型选择。