一种用于质谱图谱的高性能轮廓生物标志物诊断方法。

A high performance profile-biomarker diagnosis for mass spectral profiles.

作者信息

Han Henry

机构信息

Department of Mathematics and Bioinformatics, Eastern Michigan University, Ypsilanti, MI 48197, USA.

出版信息

BMC Syst Biol. 2011;5 Suppl 2(Suppl 2):S5. doi: 10.1186/1752-0509-5-S2-S5. Epub 2011 Dec 14.

DOI:10.1186/1752-0509-5-S2-S5

PMID:22784576

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3287485/

Abstract

BACKGROUND

Although mass spectrometry based proteomics demonstrates an exciting promise in complex diseases diagnosis, it remains an important research field rather than an applicable clinical routine for its diagnostic accuracy and data reproducibility. Relatively less investigation has been done yet in attaining high-performance proteomic pattern classification compared with the amount of endeavours in enhancing data reproducibility.

METHODS

In this study, we present a novel machine learning approach to achieve a clinical level disease diagnosis for mass spectral data. We propose multi-resolution independent component analysis, a novel feature selection algorithm to tackle the large dimensionality of mass spectra, by following our local and global feature selection framework. We also develop high-performance classifiers by embedding multi-resolution independent component analysis in linear discriminant analysis and support vector machines.

RESULTS

Our multi-resolution independent component based support vector machines not only achieve clinical level classification accuracy, but also overcome the weakness in traditional peak-selection based biomarker discovery. In addition to rigorous theoretical analysis, we demonstrate our method's superiority by comparing it with nine state-of-the-art classification and regression algorithms on six heterogeneous mass spectral profiles.

CONCLUSIONS

Our work not only suggests an alternative direction from machine learning to accelerate mass spectral proteomic technologies into a clinical routine by treating an input profile as a 'profile-biomarker', but also has positive impacts on large scale 'omics' data mining. Related source codes and data sets can be found at: https://sites.google.com/site/heyaumbioinformatics/home/proteomics.

摘要

背景

尽管基于质谱的蛋白质组学在复杂疾病诊断方面展现出令人兴奋的前景，但由于其诊断准确性和数据可重复性，它仍然是一个重要的研究领域，而非可应用于临床的常规方法。与提高数据可重复性的大量努力相比，在实现高性能蛋白质组学模式分类方面的研究相对较少。

方法

在本研究中，我们提出了一种新颖的机器学习方法，用于对质谱数据进行临床水平的疾病诊断。我们提出了多分辨率独立成分分析，这是一种新颖的特征选择算法，通过遵循我们的局部和全局特征选择框架来处理质谱的高维度问题。我们还通过将多分辨率独立成分分析嵌入线性判别分析和支持向量机中，开发了高性能分类器。

结果

我们基于多分辨率独立成分的支持向量机不仅实现了临床水平的分类准确率，还克服了传统基于峰选择的生物标志物发现方法的弱点。除了严格的理论分析外，我们通过在六个异质质谱图谱上与九种最先进的分类和回归算法进行比较，证明了我们方法的优越性。

结论

我们的工作不仅从机器学习的角度提出了一个替代方向，即通过将输入图谱视为“图谱生物标志物”，加速质谱蛋白质组学技术进入临床常规，而且对大规模“组学”数据挖掘也有积极影响。相关源代码和数据集可在以下网址找到：https://sites.google.com/site/heyaumbioinformatics/home/proteomics 。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40ef/3287485/17dd83711d87/1752-0509-5-S2-S5-1.jpg

相似文献

A high performance profile-biomarker diagnosis for mass spectral profiles.一种用于质谱图谱的高性能轮廓生物标志物诊断方法。

BMC Syst Biol. 2011;5 Suppl 2(Suppl 2):S5. doi: 10.1186/1752-0509-5-S2-S5. Epub 2011 Dec 14.

Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery.多分辨率独立成分分析在高性能肿瘤分类和生物标志物发现中的应用。

BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S7. doi: 10.1186/1471-2105-12-S1-S7.

Feature selection and nearest centroid classification for protein mass spectrometry.蛋白质质谱的特征选择与最近质心分类

BMC Bioinformatics. 2005 Mar 23;6:68. doi: 10.1186/1471-2105-6-68.

Derivative component analysis for mass spectral serum proteomic profiles.质谱血清蛋白质组图谱的衍生成分分析。

BMC Med Genomics. 2014;7 Suppl 1(Suppl 1):S5. doi: 10.1186/1755-8794-7-S1-S5. Epub 2014 May 8.

A novel profile biomarker diagnosis for mass spectral proteomics.一种用于质谱蛋白质组学的新型轮廓生物标志物诊断方法。

Pac Symp Biocomput. 2014:340-51.

Nonnegative principal component analysis for mass spectral serum profiles and biomarker discovery.基于非负主成分分析的血清质谱轮廓研究和生物标志物发现。

BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S1. doi: 10.1186/1471-2105-11-S1-S1.

Analysis of mass spectral serum profiles for biomarker selection.用于生物标志物选择的质谱血清谱分析。

Bioinformatics. 2005 Nov 1;21(21):4039-45. doi: 10.1093/bioinformatics/bti670. Epub 2005 Sep 13.

Comparison of feature selection and classification for MALDI-MS data.基质辅助激光解吸电离飞行时间质谱（MALDI-MS）数据的特征选择与分类比较

BMC Genomics. 2009 Jul 7;10 Suppl 1(Suppl 1):S3. doi: 10.1186/1471-2164-10-S1-S3.

An extended Markov blanket approach to proteomic biomarker detection from high-resolution mass spectrometry data.一种基于扩展马尔可夫毯方法从高分辨率质谱数据中检测蛋白质组学生物标志物。

IEEE Trans Inf Technol Biomed. 2009 Mar;13(2):195-206. doi: 10.1109/TITB.2008.2007909. Epub 2008 Dec 31.

Sparse Proteomics Analysis - a compressed sensing-based approach for feature selection and classification of high-dimensional proteomics mass spectrometry data.稀疏蛋白质组学分析——一种基于压缩感知的高维蛋白质组学质谱数据特征选择和分类方法。

BMC Bioinformatics. 2017 Mar 9;18(1):160. doi: 10.1186/s12859-017-1565-4.

引用本文的文献

Derivative component analysis for mass spectral serum proteomic profiles.质谱血清蛋白质组图谱的衍生成分分析。

BMC Med Genomics. 2014;7 Suppl 1(Suppl 1):S5. doi: 10.1186/1755-8794-7-S1-S5. Epub 2014 May 8.

本文引用的文献

Accurate mass spectrometry based protein quantification via shared peptides.基于共享肽段的精确质谱蛋白质定量分析

J Comput Biol. 2012 Apr;19(4):337-48. doi: 10.1089/cmb.2009.0267. Epub 2012 Mar 13.

BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S7. doi: 10.1186/1471-2105-12-S1-S7.

Principal network analysis: identification of subnetworks representing major dynamics using gene expression data.主要网络分析：使用基因表达数据识别代表主要动态的子网。

Bioinformatics. 2011 Feb 1;27(3):391-8. doi: 10.1093/bioinformatics/btq670. Epub 2010 Dec 30.

Nonnegative principal component analysis for mass spectral serum profiles and biomarker discovery.基于非负主成分分析的血清质谱轮廓研究和生物标志物发现。

BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S1. doi: 10.1186/1471-2105-11-S1-S1.

Robust biomarker identification for cancer diagnosis with ensemble feature selection methods.基于集成特征选择方法的癌症诊断稳健生物标志物识别。

Bioinformatics. 2010 Feb 1;26(3):392-8. doi: 10.1093/bioinformatics/btp630. Epub 2009 Nov 25.

Biomarker discovery in MALDI-TOF serum protein profiles using discrete wavelet transformation.利用离散小波变换在基质辅助激光解吸电离飞行时间质谱血清蛋白谱中发现生物标志物。

Bioinformatics. 2009 Mar 1;25(5):643-9. doi: 10.1093/bioinformatics/btn662.

On the estimation of false positives in peptide identifications using decoy search strategy.关于使用诱饵搜索策略估计肽段鉴定中的假阳性

Proteomics. 2009 Jan;9(1):194-204. doi: 10.1002/pmic.200800330.

Comprehensive mass-spectrometry-based proteome quantification of haploid versus diploid yeast.基于质谱法对单倍体与二倍体酵母进行蛋白质组全面定量分析。

Nature. 2008 Oct 30;455(7217):1251-4. doi: 10.1038/nature07341. Epub 2008 Sep 28.

Comparison of algorithms for pre-processing of SELDI-TOF mass spectrometry data.表面增强激光解吸电离飞行时间质谱（SELDI-TOF）数据预处理算法的比较

Bioinformatics. 2008 Oct 1;24(19):2129-36. doi: 10.1093/bioinformatics/btn398. Epub 2008 Aug 11.

Fast and robust fixed-point algorithms for independent component analysis.用于独立成分分析的快速且稳健的定点算法。

IEEE Trans Neural Netw. 1999;10(3):626-34. doi: 10.1109/72.761722.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于质谱图谱的高性能轮廓生物标志物诊断方法。

A high performance profile-biomarker diagnosis for mass spectral profiles.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献