多分辨率独立成分分析在高性能肿瘤分类和生物标志物发现中的应用。

Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery.

机构信息

Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.

出版信息

BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S7. doi: 10.1186/1471-2105-12-S1-S7.

DOI:10.1186/1471-2105-12-S1-S7

PMID:21342590

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3044315/

Abstract

BACKGROUND

Although high-throughput microarray based molecular diagnostic technologies show a great promise in cancer diagnosis, it is still far from a clinical application due to its low and instable sensitivities and specificities in cancer molecular pattern recognition. In fact, high-dimensional and heterogeneous tumor profiles challenge current machine learning methodologies for its small number of samples and large or even huge number of variables (genes). This naturally calls for the use of an effective feature selection in microarray data classification.

METHODS

We propose a novel feature selection method: multi-resolution independent component analysis (MICA) for large-scale gene expression data. This method overcomes the weak points of the widely used transform-based feature selection methods such as principal component analysis (PCA), independent component analysis (ICA), and nonnegative matrix factorization (NMF) by avoiding their global feature-selection mechanism. In addition to demonstrating the effectiveness of the multi-resolution independent component analysis in meaningful biomarker discovery, we present a multi-resolution independent component analysis based support vector machines (MICA-SVM) and linear discriminant analysis (MICA-LDA) to attain high-performance classifications in low-dimensional spaces.

RESULTS

We have demonstrated the superiority and stability of our algorithms by performing comprehensive experimental comparisons with nine state-of-the-art algorithms on six high-dimensional heterogeneous profiles under cross validations. Our classification algorithms, especially, MICA-SVM, not only accomplish clinical or near-clinical level sensitivities and specificities, but also show strong performance stability over its peers in classification. Software that implements the major algorithm and data sets on which this paper focuses are freely available at https://sites.google.com/site/heyaumapbc2011/.

CONCLUSIONS

This work suggests a new direction to accelerate microarray technologies into a clinical routine through building a high-performance classifier to attain clinical-level sensitivities and specificities by treating an input profile as a 'profile-biomarker'. The multi-resolution data analysis based redundant global feature suppressing and effective local feature extraction also have a positive impact on large scale 'omics' data mining.

摘要

背景

虽然高通量基于微阵列的分子诊断技术在癌症诊断中显示出巨大的潜力，但由于其在癌症分子模式识别中的低且不稳定的敏感性和特异性，仍远未达到临床应用。事实上，高维异质肿瘤谱对当前的机器学习方法提出了挑战，因为其样本数量少，而变量（基因）数量大甚至巨大。这自然需要在微阵列数据分类中使用有效的特征选择。

方法

我们提出了一种新的特征选择方法：用于大规模基因表达数据的多分辨率独立成分分析（MICA）。该方法通过避免广泛使用的基于变换的特征选择方法（如主成分分析（PCA）、独立成分分析（ICA）和非负矩阵分解（NMF））的全局特征选择机制，克服了它们的弱点。除了展示多分辨率独立成分分析在有意义的生物标志物发现中的有效性外，我们还提出了基于多分辨率独立成分分析的支持向量机（MICA-SVM）和线性判别分析（MICA-LDA），以在低维空间中实现高性能分类。

结果

我们通过在交叉验证下对六个高维异质谱进行了与九种最先进算法的综合实验比较，证明了我们算法的优越性和稳定性。我们的分类算法，特别是 MICA-SVM，不仅达到了临床或接近临床水平的敏感性和特异性，而且在分类方面表现出比其同行更强的性能稳定性。实现本文重点主要算法和数据集的软件可在 https://sites.google.com/site/heyaumapbc2011/ 上免费获得。

结论

这项工作通过构建高性能分类器，将微阵列技术加速到临床常规中，提出了一个新的方向，通过将输入谱视为“谱生物标志物”，达到临床水平的敏感性和特异性。基于多分辨率数据分析的冗余全局特征抑制和有效局部特征提取也对大规模“组学”数据挖掘产生了积极影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10bb/3044315/61a3621d8dce/1471-2105-12-S1-S7-1.jpg

相似文献

Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery.多分辨率独立成分分析在高性能肿瘤分类和生物标志物发现中的应用。

BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S7. doi: 10.1186/1471-2105-12-S1-S7.

A high performance profile-biomarker diagnosis for mass spectral profiles.一种用于质谱图谱的高性能轮廓生物标志物诊断方法。

BMC Syst Biol. 2011;5 Suppl 2(Suppl 2):S5. doi: 10.1186/1752-0509-5-S2-S5. Epub 2011 Dec 14.

Improving gene expression cancer molecular pattern discovery using nonnegative principal component analysis.使用非负主成分分析改进基因表达癌症分子模式发现

Genome Inform. 2008;21:200-11.

Nonnegative principal component analysis for cancer molecular pattern discovery.基于非负主成分分析的癌症分子模式发现。

IEEE/ACM Trans Comput Biol Bioinform. 2010 Jul-Sep;7(3):537-49. doi: 10.1109/TCBB.2009.36.

Multiclass cancer classification and biomarker discovery using GA-based algorithms.使用基于遗传算法的算法进行多类别癌症分类和生物标志物发现。

Bioinformatics. 2005 Jun 1;21(11):2691-7. doi: 10.1093/bioinformatics/bti419. Epub 2005 Apr 6.

Nonnegative principal component analysis for mass spectral serum profiles and biomarker discovery.基于非负主成分分析的血清质谱轮廓研究和生物标志物发现。

BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S1. doi: 10.1186/1471-2105-11-S1-S1.

Independent component analysis-based penalized discriminant method for tumor classification using gene expression data.基于独立成分分析的惩罚判别方法用于利用基因表达数据进行肿瘤分类

Bioinformatics. 2006 Aug 1;22(15):1855-62. doi: 10.1093/bioinformatics/btl190. Epub 2006 May 18.

Tumor classification ranking from microarray data.基于微阵列数据的肿瘤分类排名

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S21. doi: 10.1186/1471-2164-9-S2-S21.

Applications of support vector machines to cancer classification with microarray data.支持向量机在利用微阵列数据进行癌症分类中的应用。

Int J Neural Syst. 2005 Dec;15(6):475-84. doi: 10.1142/S0129065705000396.

A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis.用于微阵列基因表达癌症诊断的多类别分类方法的综合评估。

Bioinformatics. 2005 Mar 1;21(5):631-43. doi: 10.1093/bioinformatics/bti033. Epub 2004 Sep 16.

引用本文的文献

MLSNet: a deep learning model for predicting transcription factor binding sites.MLSNet：一种用于预测转录因子结合位点的深度学习模型。

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae489.

Topic Evolution Analysis for Omics Data Integration in Cancers.癌症中组学数据整合的主题演变分析

Front Cell Dev Biol. 2021 Apr 7;9:631011. doi: 10.3389/fcell.2021.631011. eCollection 2021.

Proteomic Data Analysis for Differential Profiling of the Autoimmune Diseases SLE, RA, SS, and ANCA-Associated Vasculitis.蛋白质组学数据分析用于自身免疫性疾病 SLE、RA、SS 和 ANCA 相关性血管炎的差异分析。

J Proteome Res. 2021 Feb 5;20(2):1252-1260. doi: 10.1021/acs.jproteome.0c00657. Epub 2020 Dec 23.

Improved metabolomic data-based prediction of depressive symptoms using nonlinear machine learning with feature selection.基于非线性机器学习与特征选择的代谢组学数据改善抑郁症状预测。

Transl Psychiatry. 2020 May 19;10(1):157. doi: 10.1038/s41398-020-0831-9.

Independent Component Analysis for Unraveling the Complexity of Cancer Omics Datasets.独立成分分析在癌症组学数据集复杂性研究中的应用

Int J Mol Sci. 2019 Sep 7;20(18):4414. doi: 10.3390/ijms20184414.

Ensemble Feature Learning of Genomic Data Using Support Vector Machine.使用支持向量机的基因组数据集成特征学习

PLoS One. 2016 Jun 15;11(6):e0157330. doi: 10.1371/journal.pone.0157330. eCollection 2016.

Transcriptome marker diagnostics using big data.利用大数据进行转录组标记诊断。

IET Syst Biol. 2016 Feb;10(1):41-8. doi: 10.1049/iet-syb.2015.0026.

Diagnostic biases in translational bioinformatics.转化生物信息学中的诊断偏差。

BMC Med Genomics. 2015 Aug 1;8:46. doi: 10.1186/s12920-015-0116-y.

Overcome support vector machine diagnosis overfitting.克服支持向量机诊断的过拟合问题。

Cancer Inform. 2014 Dec 9;13(Suppl 1):145-58. doi: 10.4137/CIN.S13875. eCollection 2014.

Matrix Factorization for Transcriptional Regulatory Network Inference.用于转录调控网络推断的矩阵分解

IEEE Symp Comput Intell Bioinforma Comput Biol Proc. 2012 May;2012:387-396. doi: 10.1109/CIBCB.2012.6217256.

本文引用的文献

Recognizing Action Units for Facial Expression Analysis.用于面部表情分析的动作单元识别

IEEE Trans Pattern Anal Mach Intell. 2001 Feb;23(2):97-115. doi: 10.1109/34.908962.

Nonnegative principal component analysis for cancer molecular pattern discovery.基于非负主成分分析的癌症分子模式发现。

IEEE/ACM Trans Comput Biol Bioinform. 2010 Jul-Sep;7(3):537-49. doi: 10.1109/TCBB.2009.36.

Fast and robust fixed-point algorithms for independent component analysis.用于独立成分分析的快速且稳健的定点算法。

IEEE Trans Neural Netw. 1999;10(3):626-34. doi: 10.1109/72.761722.

A stromal gene signature associated with inflammatory breast cancer.一种与炎性乳腺癌相关的基质基因特征。

Int J Cancer. 2008 Mar 15;122(6):1324-32. doi: 10.1002/ijc.23237.

Gene expression profiles and prognostic markers for primary breast cancer.原发性乳腺癌的基因表达谱及预后标志物

Methods Mol Biol. 2007;377:131-8. doi: 10.1007/978-1-59745-390-5_7.

Characterization of the amplicon on chromosomal segment 4q12 in glioblastoma multiforme.多形性胶质母细胞瘤中4q12染色体片段上扩增子的特征分析。

Neuro Oncol. 2007 Jul;9(3):291-7. doi: 10.1215/15228517-2007-009. Epub 2007 May 15.

MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data.MSVM-RFE：用于DNA微阵列数据多类基因选择的SVM-RFE扩展方法

Bioinformatics. 2007 May 1;23(9):1106-14. doi: 10.1093/bioinformatics/btm036.

Role of Fra-2 in breast cancer: influence on tumor cell invasion and motility.Fra-2在乳腺癌中的作用：对肿瘤细胞侵袭和运动能力的影响。

Breast Cancer Res Treat. 2008 Feb;107(3):337-47. doi: 10.1007/s10549-007-9559-y. Epub 2007 Mar 28.

Jun and Fos family protein expression in human breast cancer: correlation of protein expression and clinicopathological parameters.Jun和Fos家族蛋白在人乳腺癌中的表达：蛋白表达与临床病理参数的相关性

Eur J Gynaecol Oncol. 2006;27(4):345-52.

p53 and breast cancer, an update.p53与乳腺癌：最新进展

Endocr Relat Cancer. 2006 Jun;13(2):293-325. doi: 10.1677/erc.1.01172.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

多分辨率独立成分分析在高性能肿瘤分类和生物标志物发现中的应用。

Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献