Suppr超能文献

多分辨率独立成分分析在高性能肿瘤分类和生物标志物发现中的应用。

Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery.

机构信息

Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.

出版信息

BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S7. doi: 10.1186/1471-2105-12-S1-S7.

Abstract

BACKGROUND

Although high-throughput microarray based molecular diagnostic technologies show a great promise in cancer diagnosis, it is still far from a clinical application due to its low and instable sensitivities and specificities in cancer molecular pattern recognition. In fact, high-dimensional and heterogeneous tumor profiles challenge current machine learning methodologies for its small number of samples and large or even huge number of variables (genes). This naturally calls for the use of an effective feature selection in microarray data classification.

METHODS

We propose a novel feature selection method: multi-resolution independent component analysis (MICA) for large-scale gene expression data. This method overcomes the weak points of the widely used transform-based feature selection methods such as principal component analysis (PCA), independent component analysis (ICA), and nonnegative matrix factorization (NMF) by avoiding their global feature-selection mechanism. In addition to demonstrating the effectiveness of the multi-resolution independent component analysis in meaningful biomarker discovery, we present a multi-resolution independent component analysis based support vector machines (MICA-SVM) and linear discriminant analysis (MICA-LDA) to attain high-performance classifications in low-dimensional spaces.

RESULTS

We have demonstrated the superiority and stability of our algorithms by performing comprehensive experimental comparisons with nine state-of-the-art algorithms on six high-dimensional heterogeneous profiles under cross validations. Our classification algorithms, especially, MICA-SVM, not only accomplish clinical or near-clinical level sensitivities and specificities, but also show strong performance stability over its peers in classification. Software that implements the major algorithm and data sets on which this paper focuses are freely available at https://sites.google.com/site/heyaumapbc2011/.

CONCLUSIONS

This work suggests a new direction to accelerate microarray technologies into a clinical routine through building a high-performance classifier to attain clinical-level sensitivities and specificities by treating an input profile as a 'profile-biomarker'. The multi-resolution data analysis based redundant global feature suppressing and effective local feature extraction also have a positive impact on large scale 'omics' data mining.

摘要

背景

虽然高通量基于微阵列的分子诊断技术在癌症诊断中显示出巨大的潜力,但由于其在癌症分子模式识别中的低且不稳定的敏感性和特异性,仍远未达到临床应用。事实上,高维异质肿瘤谱对当前的机器学习方法提出了挑战,因为其样本数量少,而变量(基因)数量大甚至巨大。这自然需要在微阵列数据分类中使用有效的特征选择。

方法

我们提出了一种新的特征选择方法:用于大规模基因表达数据的多分辨率独立成分分析(MICA)。该方法通过避免广泛使用的基于变换的特征选择方法(如主成分分析(PCA)、独立成分分析(ICA)和非负矩阵分解(NMF))的全局特征选择机制,克服了它们的弱点。除了展示多分辨率独立成分分析在有意义的生物标志物发现中的有效性外,我们还提出了基于多分辨率独立成分分析的支持向量机(MICA-SVM)和线性判别分析(MICA-LDA),以在低维空间中实现高性能分类。

结果

我们通过在交叉验证下对六个高维异质谱进行了与九种最先进算法的综合实验比较,证明了我们算法的优越性和稳定性。我们的分类算法,特别是 MICA-SVM,不仅达到了临床或接近临床水平的敏感性和特异性,而且在分类方面表现出比其同行更强的性能稳定性。实现本文重点主要算法和数据集的软件可在 https://sites.google.com/site/heyaumapbc2011/ 上免费获得。

结论

这项工作通过构建高性能分类器,将微阵列技术加速到临床常规中,提出了一个新的方向,通过将输入谱视为“谱生物标志物”,达到临床水平的敏感性和特异性。基于多分辨率数据分析的冗余全局特征抑制和有效局部特征提取也对大规模“组学”数据挖掘产生了积极影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10bb/3044315/61a3621d8dce/1471-2105-12-S1-S7-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验