Suppr超能文献

探讨肿瘤分类的类内和类间相关系数分布。

Exploring the within- and between-class correlation distributions for tumor classification.

机构信息

Department of Statistics, University of California, 8125 Math Sciences Building, Box 951554, Los Angeles, CA 90095-1554, USA.

出版信息

Proc Natl Acad Sci U S A. 2010 Apr 13;107(15):6737-42. doi: 10.1073/pnas.0910140107. Epub 2010 Mar 25.

Abstract

To many biomedical researchers, effective tumor classification methods such as the support vector machine often appear like a black box not only because the procedures are complex but also because the required specifications, such as the choice of a kernel function, suffer from a clear guidance either mathematically or biologically. As commonly observed, samples within the same tumor class tend to be more similar in gene expression than samples from different tumor classes. But can this well-received observation lead to a useful procedure of classification and prediction? To address this issue, we first conceived a statistical framework and derived general conditions to serve as the theoretical foundation that supported the aforementioned empirical observation. Then we constructed a classification procedure that fully utilized the information obtained by comparing the distributions of within-class correlations with between-class correlations via Kullback-Leibler divergence. We compared our approach with many machine-learning techniques by applying to 22 binary- and multiclass gene-expression datasets involving human cancers. The results showed that our method performed as efficiently as support vector machine and Naïve Bayesian and outperformed other learning methods (decision trees, linear discriminate analysis, and k-nearest neighbor). In addition, we conducted a simulation study and showed that our method would be more effective if the arriving new samples are subject to the often-encountered baseline shift or increased noise level problems. Our method can be extended for general classification problems when only the similarity scores between samples are available.

摘要

对于许多生物医学研究人员来说,有效的肿瘤分类方法,如支持向量机,不仅因为程序复杂,还因为所需的规格(如核函数的选择)在数学上或生物学上都没有明确的指导,所以看起来就像一个黑盒子。通常观察到的是,同一肿瘤类别的样本在基因表达上比不同肿瘤类别的样本更相似。但是,这种广受欢迎的观察结果能否带来有用的分类和预测程序呢?为了解决这个问题,我们首先构思了一个统计框架,并得出了一些普遍的条件,作为支持上述经验观察的理论基础。然后,我们构建了一个分类程序,该程序充分利用了通过 Kullback-Leibler 散度比较类内相关性和类间相关性分布所获得的信息。我们通过将其应用于涉及人类癌症的 22 个二分类和多分类基因表达数据集,将我们的方法与许多机器学习技术进行了比较。结果表明,我们的方法与支持向量机和朴素贝叶斯一样有效,优于其他学习方法(决策树、线性判别分析和 k-最近邻)。此外,我们进行了一项模拟研究,结果表明,如果新样本受到基线偏移或噪声水平增加等常见问题的影响,我们的方法将更加有效。当只有样本之间的相似性得分可用时,我们的方法可以扩展到一般的分类问题。

相似文献

1
Exploring the within- and between-class correlation distributions for tumor classification.探讨肿瘤分类的类内和类间相关系数分布。
Proc Natl Acad Sci U S A. 2010 Apr 13;107(15):6737-42. doi: 10.1073/pnas.0910140107. Epub 2010 Mar 25.
8
Improving cancer classification accuracy using gene pairs.利用基因对提高癌症分类准确性。
PLoS One. 2010 Dec 21;5(12):e14305. doi: 10.1371/journal.pone.0014305.

引用本文的文献

6
Knowledge discovery by accuracy maximization.通过最大化准确性进行知识发现。
Proc Natl Acad Sci U S A. 2014 Apr 8;111(14):5117-22. doi: 10.1073/pnas.1220873111. Epub 2014 Mar 24.
9
Metagenomic biomarker discovery and explanation.宏基因组生物标志物发现与阐释。
Genome Biol. 2011 Jun 24;12(6):R60. doi: 10.1186/gb-2011-12-6-r60.

本文引用的文献

10
Diversity of gene expression in adenocarcinoma of the lung.肺腺癌中基因表达的多样性。
Proc Natl Acad Sci U S A. 2001 Nov 20;98(24):13784-9. doi: 10.1073/pnas.241500798. Epub 2001 Nov 13.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验