Suppr超能文献

基于基因表达谱的稳健且准确的癌症分类

Robust and accurate cancer classification with gene expression profiling.

作者信息

Li Haifeng, Zhang Keshu, Jiang Tao

机构信息

Dept. of Computer Science, University of California at Riverside, Riverside, CA 92521, USA.

出版信息

Proc IEEE Comput Syst Bioinform Conf. 2005:310-21. doi: 10.1109/csb.2005.49.

Abstract

Robust and accurate cancer classification is critical in cancer treatment. Gene expression profiling is expected to enable us to diagnose tumors precisely and systematically. However, the classification task in this context is very challenging because of the curse of dimensionality and the small sample size problem. In this paper, we propose a novel method to solve these two problems. Our method is able to map gene expression data into a very low dimensional space and thus meets the recommended samples to features per class ratio. As a result, it can be used to classify new samples robustly with low and trustable (estimated) error rates. The method is based on linear discriminant analysis (LDA). However, the conventional LDA requires that the within-class scatter matrix S(w) be nonsingular. Unfortunately, Sw is always singular in the case of cancer classification due to the small sample size problem. To overcome this problem, we develop a generalized linear discriminant analysis (GLDA) that is a general, direct, and complete solution to optimize Fisher's criterion. GLDA is mathematically well-founded and coincides with the conventional LDA when S(w) is nonsingular. Different from the conventional LDA, GLDA does not assume the nonsingularity of S(w), and thus naturally solves the small sample size problem. To accommodate the high dimensionality of scatter matrices, a fast algorithm of GLDA is also developed. Our extensive experiments on seven public cancer datasets show that the method performs well. Especially on some difficult instances that have very small samples to genes per class ratios, our method achieves much higher accuracies than widely used classification methods such as support vector machines, random forests, etc.

摘要

强大且准确的癌症分类在癌症治疗中至关重要。基因表达谱分析有望使我们能够精确且系统地诊断肿瘤。然而,由于维度诅咒和小样本量问题,在此背景下的分类任务极具挑战性。在本文中,我们提出了一种新颖的方法来解决这两个问题。我们的方法能够将基因表达数据映射到一个极低维空间,从而满足每类推荐的样本与特征比例。因此,它可用于以低且可靠(估计)的错误率对新样本进行稳健分类。该方法基于线性判别分析(LDA)。然而,传统的LDA要求类内散度矩阵S(w)是非奇异的。不幸的是,由于小样本量问题,在癌症分类的情况下Sw总是奇异的。为克服这个问题,我们开发了一种广义线性判别分析(GLDA),它是优化Fisher准则的一种通用、直接且完整的解决方案。GLDA在数学上有充分依据,并且当S(w)非奇异时与传统的LDA一致。与传统的LDA不同,GLDA不假设S(w)的非奇异性,从而自然地解决了小样本量问题。为适应散度矩阵的高维度,还开发了一种GLDA的快速算法。我们在七个公开癌症数据集上进行的广泛实验表明该方法性能良好。特别是在一些每类样本与基因比例非常小的困难实例上,我们的方法比支持向量机、随机森林等广泛使用的分类方法取得了高得多的准确率。

相似文献

1
Robust and accurate cancer classification with gene expression profiling.
Proc IEEE Comput Syst Bioinform Conf. 2005:310-21. doi: 10.1109/csb.2005.49.
2
3
Toward a measure of classification complexity in gene expression signatures.
Annu Int Conf IEEE Eng Med Biol Soc. 2008;2008:5704-7. doi: 10.1109/IEMBS.2008.4650509.
4
Recursive gene selection based on maximum margin criterion: a comparison with SVM-RFE.
BMC Bioinformatics. 2006 Dec 25;7:543. doi: 10.1186/1471-2105-7-543.
5
A robust meta-classification strategy for cancer diagnosis from gene expression data.
Proc IEEE Comput Syst Bioinform Conf. 2005:322-5. doi: 10.1109/csb.2005.7.
6
Eigengene-based linear discriminant model for tumor classification using gene expression microarray data.
Bioinformatics. 2006 Nov 1;22(21):2635-42. doi: 10.1093/bioinformatics/btl442. Epub 2006 Aug 22.
7
Multiclass molecular cancer classification by kernel subspace methods with effective kernel parameter selection.
J Bioinform Comput Biol. 2005 Oct;3(5):1071-88. doi: 10.1142/s0219720005001491.
8
Independent component analysis-based penalized discriminant method for tumor classification using gene expression data.
Bioinformatics. 2006 Aug 1;22(15):1855-62. doi: 10.1093/bioinformatics/btl190. Epub 2006 May 18.
9
Robust Selection of Predictive Genes via a Simple Classifier.
Appl Bioinformatics. 2006;5(1):1-11. doi: 10.2165/00822942-200605010-00001.
10
Bagging linear sparse Bayesian learning models for variable selection in cancer diagnosis.
IEEE Trans Inf Technol Biomed. 2007 May;11(3):338-47. doi: 10.1109/titb.2006.889702.

引用本文的文献

2
3
ANMM4CBR: a case-based reasoning method for gene expression data classification.
Algorithms Mol Biol. 2010 Jan 6;5:14. doi: 10.1186/1748-7188-5-14.
4
Comparison of linear discriminant analysis methods for the classification of cancer based on gene expression data.
J Exp Clin Cancer Res. 2009 Dec 10;28(1):149. doi: 10.1186/1756-9966-28-149.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验