稀疏蛋白质组学分析——一种基于压缩感知的高维蛋白质组学质谱数据特征选择和分类方法。

Sparse Proteomics Analysis - a compressed sensing-based approach for feature selection and classification of high-dimensional proteomics mass spectrometry data.

作者信息

Conrad Tim O F, Genzel Martin, Cvetkovic Nada, Wulkow Niklas, Leichtle Alexander, Vybiral Jan, Kutyniok Gitta, Schütte Christof

机构信息

Department of Mathematics, Freie Universität Berlin, Arnimallee 6, Berlin, Germany.

Zuse Institute Berlin, Takustr. 7, Berlin, Germany.

出版信息

BMC Bioinformatics. 2017 Mar 9;18(1):160. doi: 10.1186/s12859-017-1565-4.

DOI:10.1186/s12859-017-1565-4

PMID:28274197

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5343371/

Abstract

BACKGROUND

High-throughput proteomics techniques, such as mass spectrometry (MS)-based approaches, produce very high-dimensional data-sets. In a clinical setting one is often interested in how mass spectra differ between patients of different classes, for example spectra from healthy patients vs. spectra from patients having a particular disease. Machine learning algorithms are needed to (a) identify these discriminating features and (b) classify unknown spectra based on this feature set. Since the acquired data is usually noisy, the algorithms should be robust against noise and outliers, while the identified feature set should be as small as possible.

RESULTS

We present a new algorithm, Sparse Proteomics Analysis (SPA), based on the theory of compressed sensing that allows us to identify a minimal discriminating set of features from mass spectrometry data-sets. We show (1) how our method performs on artificial and real-world data-sets, (2) that its performance is competitive with standard (and widely used) algorithms for analyzing proteomics data, and (3) that it is robust against random and systematic noise. We further demonstrate the applicability of our algorithm to two previously published clinical data-sets.

摘要

背景

高通量蛋白质组学技术，如基于质谱（MS）的方法，会产生非常高维的数据集。在临床环境中，人们通常感兴趣的是不同类别患者的质谱如何不同，例如健康患者的光谱与患有特定疾病患者的光谱之间的差异。需要机器学习算法来（a）识别这些区分特征，以及（b）基于此特征集对未知光谱进行分类。由于获取的数据通常有噪声，算法应能抵御噪声和异常值，同时识别出的特征集应尽可能小。

结果

我们基于压缩感知理论提出了一种新算法，即稀疏蛋白质组学分析（SPA），它使我们能够从质谱数据集中识别出一组最小的区分特征。我们展示了（1）我们的方法在人工和真实世界数据集上的表现，（2）其性能与用于分析蛋白质组学数据的标准（且广泛使用）算法具有竞争力，以及（3）它对随机和系统噪声具有鲁棒性。我们进一步证明了我们的算法对两个先前发表过的临床数据集的适用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a00b/5343371/d57313b22cc3/12859_2017_1565_Fig1_HTML.jpg

相似文献

Sparse Proteomics Analysis - a compressed sensing-based approach for feature selection and classification of high-dimensional proteomics mass spectrometry data.稀疏蛋白质组学分析——一种基于压缩感知的高维蛋白质组学质谱数据特征选择和分类方法。

BMC Bioinformatics. 2017 Mar 9;18(1):160. doi: 10.1186/s12859-017-1565-4.

Feature selection and nearest centroid classification for protein mass spectrometry.蛋白质质谱的特征选择与最近质心分类

BMC Bioinformatics. 2005 Mar 23;6:68. doi: 10.1186/1471-2105-6-68.

A high performance profile-biomarker diagnosis for mass spectral profiles.一种用于质谱图谱的高性能轮廓生物标志物诊断方法。

BMC Syst Biol. 2011;5 Suppl 2(Suppl 2):S5. doi: 10.1186/1752-0509-5-S2-S5. Epub 2011 Dec 14.

Biomarker Signature Discovery from Mass Spectrometry Data.

IEEE/ACM Trans Comput Biol Bioinform. 2014 Jul-Aug;11(4):766-72. doi: 10.1109/TCBB.2014.2318718.

Feature selection and machine learning with mass spectrometry data.基于质谱数据的特征选择与机器学习

Methods Mol Biol. 2013;1007:237-62. doi: 10.1007/978-1-62703-392-3_10.

Feature extraction and dimensionality reduction for mass spectrometry data.质谱数据的特征提取与降维

Comput Biol Med. 2009 Sep;39(9):818-23. doi: 10.1016/j.compbiomed.2009.06.012. Epub 2009 Jul 30.

Interpretation of mass spectrometry data for high-throughput proteomics.

Anal Bioanal Chem. 2003 Aug;376(7):1014-22. doi: 10.1007/s00216-003-1995-x. Epub 2003 Jul 5.

Pancreatic cancer biomarkers discovery by surface-enhanced laser desorption and ionization time-of-flight mass spectrometry.通过表面增强激光解吸电离飞行时间质谱法发现胰腺癌生物标志物

Clin Chem Lab Med. 2009;47(6):713-23. doi: 10.1515/CCLM.2009.158.

Comparison of feature selection and classification for MALDI-MS data.基质辅助激光解吸电离飞行时间质谱（MALDI-MS）数据的特征选择与分类比较

BMC Genomics. 2009 Jul 7;10 Suppl 1(Suppl 1):S3. doi: 10.1186/1471-2164-10-S1-S3.

Proteomic data analysis workflow for discovery of candidate biomarker peaks predictive of clinical outcome for patients with acute myeloid leukemia.用于发现预测急性髓性白血病患者临床结局的候选生物标志物峰的蛋白质组学数据分析流程。

J Proteome Res. 2008 Jun;7(6):2332-41. doi: 10.1021/pr070482e. Epub 2008 May 2.

引用本文的文献

Discrimination of Klebsiella pneumoniae and Klebsiella quasipneumoniae by MALDI-TOF Mass Spectrometry Coupled With Machine Learning.通过基质辅助激光解吸电离飞行时间质谱联用机器学习鉴别肺炎克雷伯菌和准肺炎克雷伯菌

Microbiologyopen. 2025 Aug;14(4):e70035. doi: 10.1002/mbo3.70035.

Automated sparse feature selection in high-dimensional proteomics data via 1-bit compressed sensing and K-Medoids clustering.通过1位压缩感知和K-中心点聚类实现高维蛋白质组学数据的自动稀疏特征选择

BMC Bioinformatics. 2025 Jul 1;26(1):165. doi: 10.1186/s12859-025-06193-2.

Task-adaptive eigenvector-based projection (EBP) transform for compressed sensing: A case study of spectroscopic profiling sensor.用于压缩感知的基于任务自适应特征向量的投影（EBP）变换：光谱分析传感器的案例研究

Anal Sci Adv. 2021 Jun 29;3(1-2):29-37. doi: 10.1002/ansa.202100018. eCollection 2022 Feb.

MarkerMap: nonlinear marker selection for single-cell studies.MarkerMap：单细胞研究中的非线性标记选择。

NPJ Syst Biol Appl. 2024 Feb 14;10(1):17. doi: 10.1038/s41540-024-00339-3.

TYROSINE KINASES: COMPLEX MOLECULAR SYSTEMS CHALLENGING COMPUTATIONAL METHODOLOGIES.酪氨酸激酶：挑战计算方法的复杂分子系统

Eur Phys J B. 2021 Oct;94(10). doi: 10.1140/epjb/s10051-021-00207-7. Epub 2021 Oct 11.

Discrimination of Escherichia coli, Shigella flexneri, and Shigella sonnei using lipid profiling by MALDI-TOF mass spectrometry paired with machine learning.利用 MALDI-TOF 质谱联用机器学习进行脂谱分析鉴别大肠埃希菌、福氏志贺菌和宋内志贺菌。

Microbiologyopen. 2022 Aug;11(4):e1313. doi: 10.1002/mbo3.1313.

A rank-based marker selection method for high throughput scRNA-seq data.基于秩的标记选择方法用于高通量 scRNA-seq 数据。

BMC Bioinformatics. 2020 Oct 23;21(1):477. doi: 10.1186/s12859-020-03641-z.

MALDI-TOF mass spectrometry on intact bacteria combined with a refined analysis framework allows accurate classification of MSSA and MRSA.基质辅助激光解吸电离飞行时间质谱法（MALDI-TOF MS）对完整细菌进行检测，并结合改良分析框架，可实现 MSSA 和 MRSA 的准确分类。

PLoS One. 2019 Jun 27;14(6):e0218951. doi: 10.1371/journal.pone.0218951. eCollection 2019.

Prostate cancer recognition based on mass spectrometry sensing data and data fingerprint recovery.基于质谱传感数据和数据指纹恢复的前列腺癌识别

Biomed Signal Process Control. 2017 Mar;33:392-399. doi: 10.1016/j.bspc.2016.12.003. Epub 2017 Jan 16.

本文引用的文献

Proteomic-Based Approaches for the Study of Cytokines in Lung Cancer.基于蛋白质组学的肺癌细胞因子研究方法

Dis Markers. 2016;2016:2138627. doi: 10.1155/2016/2138627. Epub 2016 Jun 30.

CEA in breast ductal secretions as a promising biomarker for the diagnosis of breast cancer: a systematic review and meta-analysis.乳腺导管分泌物中的癌胚抗原作为乳腺癌诊断的一种有前景的生物标志物：一项系统评价和荟萃分析

Breast Cancer. 2016 Nov;23(6):813-819. doi: 10.1007/s12282-016-0680-9. Epub 2016 Feb 22.

Prognostic and predictive markers in pancreatic adenocarcinoma.胰腺腺癌的预后和预测标志物

Dig Liver Dis. 2016 Mar;48(3):223-30. doi: 10.1016/j.dld.2015.11.001. Epub 2015 Nov 14.

Multiple Imputation for Multivariate Missing-Data Problems: A Data Analyst's Perspective.多元缺失数据问题的多重填补：数据分析师视角

Multivariate Behav Res. 1998 Oct 1;33(4):545-71. doi: 10.1207/s15327906mbr3304_5.

Differential protein expression and peak selection in mass spectrometry data by binary discriminant analysis.基于二元判别分析的质谱数据差异蛋白质表达和峰选择。

Bioinformatics. 2015 Oct 1;31(19):3156-62. doi: 10.1093/bioinformatics/btv334. Epub 2015 May 28.

Combination of tumour markers CEA and CA19-9 improves the prognostic prediction in patients with pancreatic cancer.肿瘤标志物癌胚抗原（CEA）和糖类抗原19-9（CA19-9）联合使用可改善胰腺癌患者的预后预测。

J Clin Pathol. 2015 Jun;68(6):427-33. doi: 10.1136/jclinpath-2014-202451. Epub 2015 Mar 10.

Glycoprotein biomarker panel for pancreatic cancer discovered by quantitative proteomics analysis.通过定量蛋白质组学分析发现的胰腺癌糖蛋白生物标志物组

J Proteome Res. 2014 Apr 4;13(4):1873-84. doi: 10.1021/pr400967x. Epub 2014 Mar 10.

Potentials and pitfalls of clinical peptidomics and metabolomics.临床肽组学和代谢组学的潜力和陷阱。

Swiss Med Wkly. 2013 Jun 6;143:w13801. doi: 10.4414/smw.2013.13801. eCollection 2013.

Pancreatic carcinoma, pancreatitis, and healthy controls: metabolite models in a three-class diagnostic dilemma.胰腺癌、胰腺炎与健康对照：三类诊断困境中的代谢物模型

Metabolomics. 2013 Jun;9(3):677-687. doi: 10.1007/s11306-012-0476-7. Epub 2012 Nov 6.

Demographics and epidemiology of pancreatic cancer.胰腺癌的人口统计学和流行病学。

Cancer J. 2012 Nov-Dec;18(6):477-84. doi: 10.1097/PPO.0b013e3182756803.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

稀疏蛋白质组学分析——一种基于压缩感知的高维蛋白质组学质谱数据特征选择和分类方法。

Sparse Proteomics Analysis - a compressed sensing-based approach for feature selection and classification of high-dimensional proteomics mass spectrometry data.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

背景

结果

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献