基于二元判别分析的质谱数据差异蛋白质表达和峰选择。

Differential protein expression and peak selection in mass spectrometry data by binary discriminant analysis.

机构信息

Anesthesiology and Intensive Care Medicine, University Hospital Greifswald, Ferdinand-Sauerbruch-Straße, D-17475 Greifswald, Germany and.

Epidemiology and Biostatistics, School of Public Health, Imperial College London, Norfolk Place, London, W2 1PG, UK.

出版信息

Bioinformatics. 2015 Oct 1;31(19):3156-62. doi: 10.1093/bioinformatics/btv334. Epub 2015 May 28.

DOI:10.1093/bioinformatics/btv334

PMID:26026136

Abstract

MOTIVATION

Proteomic mass spectrometry analysis is becoming routine in clinical diagnostics, for example to monitor cancer biomarkers using blood samples. However, differential proteomics and identification of peaks relevant for class separation remains challenging.

RESULTS

Here, we introduce a simple yet effective approach for identifying differentially expressed proteins using binary discriminant analysis. This approach works by data-adaptive thresholding of protein expression values and subsequent ranking of the dichotomized features using a relative entropy measure. Our framework may be viewed as a generalization of the 'peak probability contrast' approach of Tibshirani et al. (2004) and can be applied both in the two-group and the multi-group setting. Our approach is computationally inexpensive and shows in the analysis of a large-scale drug discovery test dataset equivalent prediction accuracy as a random forest. Furthermore, we were able to identify in the analysis of mass spectrometry data from a pancreas cancer study biological relevant and statistically predictive marker peaks unrecognized in the original study.

AVAILABILITY AND IMPLEMENTATION

The methodology for binary discriminant analysis is implemented in the R package binda, which is freely available under the GNU General Public License (version 3 or later) from CRAN at URL http://cran.r-project.org/web/packages/binda/. R scripts reproducing all described analyzes are available from the web page http://strimmerlab.org/software/binda/.

CONTACT

k.strimmer@imperial.ac.uk.

摘要

动机

蛋白质组学质谱分析在临床诊断中已成为常规，例如使用血液样本监测癌症生物标志物。然而，差异蛋白质组学和鉴定与分类分离相关的峰仍然具有挑战性。

结果

在这里，我们介绍了一种使用二元判别分析识别差异表达蛋白的简单而有效的方法。该方法通过对蛋白表达值进行数据自适应阈值处理，并使用相对熵度量对二分类特征进行排序，从而实现对差异表达蛋白的识别。我们的方法可以看作是 Tibshirani 等人（2004 年）提出的“峰概率对比”方法的推广，可以应用于两组和多组情况。我们的方法计算成本低，在对大规模药物发现测试数据集的分析中，其预测准确性与随机森林相当。此外，我们还能够在胰腺癌细胞研究的质谱数据分析中，识别出在原始研究中未被识别的生物学相关和统计学上具有预测性的标记峰。

可用性和实施

二元判别分析的方法在 R 包 binda 中实现，该包可在 GNU 通用公共许可证（版本 3 或更高版本）下从 CRAN 网址 http://cran.r-project.org/web/packages/binda/ 免费获得。重现所有描述性分析的 R 脚本可从网页 http://strimmerlab.org/software/binda/ 获得。

联系方式

k.strimmer@imperial.ac.uk。

相似文献

Differential protein expression and peak selection in mass spectrometry data by binary discriminant analysis.基于二元判别分析的质谱数据差异蛋白质表达和峰选择。

Bioinformatics. 2015 Oct 1;31(19):3156-62. doi: 10.1093/bioinformatics/btv334. Epub 2015 May 28.

MALDIquant: a versatile R package for the analysis of mass spectrometry data.MALDIquant：用于质谱数据分析的多功能 R 包。

Bioinformatics. 2012 Sep 1;28(17):2270-1. doi: 10.1093/bioinformatics/bts447. Epub 2012 Jul 12.

MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments.MSstats：一个用于基于定量质谱的蛋白质组学实验统计分析的R软件包。

Bioinformatics. 2014 Sep 1;30(17):2524-6. doi: 10.1093/bioinformatics/btu305. Epub 2014 May 2.

NetWeAvers: an R package for integrative biological network analysis with mass spectrometry data.NetWeAvers：一个用于整合质谱数据的生物网络分析的 R 包。

Bioinformatics. 2013 Nov 15;29(22):2946-7. doi: 10.1093/bioinformatics/btt513. Epub 2013 Sep 4.

sfinx: an R package for the elimination of false positives from affinity purification-mass spectrometry datasets.sfinx：用于从亲和纯化-质谱数据集消除假阳性的 R 包。

Bioinformatics. 2017 Jun 15;33(12):1902-1904. doi: 10.1093/bioinformatics/btx076.

Direction pathway analysis of large-scale proteomics data reveals novel features of the insulin action pathway.大规模蛋白质组学数据的方向途径分析揭示了胰岛素作用途径的新特征。

Bioinformatics. 2014 Mar 15;30(6):808-14. doi: 10.1093/bioinformatics/btt616. Epub 2013 Oct 27.

A multi-model statistical approach for proteomic spectral count quantitation.一种用于蛋白质组学光谱计数定量的多模型统计方法。

J Proteomics. 2016 Jul 20;144:23-32. doi: 10.1016/j.jprot.2016.05.032. Epub 2016 May 31.

Simultaneous and exact interval estimates for the contrast of two groups based on an extremely high dimensional variable: application to mass spec data.基于极高维变量的两组对比的同时精确区间估计：在质谱数据中的应用

Bioinformatics. 2007 Jun 15;23(12):1451-8. doi: 10.1093/bioinformatics/btm130. Epub 2007 Apr 25.

KODAMA: an R package for knowledge discovery and data mining.KODAMA：一个用于知识发现和数据挖掘的R软件包。

Bioinformatics. 2017 Feb 15;33(4):621-623. doi: 10.1093/bioinformatics/btw705.

Classification of premalignant pancreatic cancer mass-spectrometry data using decision tree ensembles.使用决策树集成对癌前胰腺癌质谱数据进行分类。

BMC Bioinformatics. 2008 Jun 11;9:275. doi: 10.1186/1471-2105-9-275.

引用本文的文献

Discrimination of Klebsiella pneumoniae and Klebsiella quasipneumoniae by MALDI-TOF Mass Spectrometry Coupled With Machine Learning.通过基质辅助激光解吸电离飞行时间质谱联用机器学习鉴别肺炎克雷伯菌和准肺炎克雷伯菌

Microbiologyopen. 2025 Aug;14(4):e70035. doi: 10.1002/mbo3.70035.

A biological reading of a palimpsest.对重写本的生物学解读。

iScience. 2023 Apr 29;26(6):106786. doi: 10.1016/j.isci.2023.106786. eCollection 2023 Jun 16.

Discrimination of the chemotherapy resistance status of human leukemia and glioblastoma cell lines by MALDI-TOF-MS profiling.应用 MALDI-TOF-MS 分析对人白血病和神经胶质瘤细胞系的化疗耐药状态进行鉴别。

Sci Rep. 2023 Apr 5;13(1):5596. doi: 10.1038/s41598-023-32608-2.

Discrimination of Escherichia coli, Shigella flexneri, and Shigella sonnei using lipid profiling by MALDI-TOF mass spectrometry paired with machine learning.利用 MALDI-TOF 质谱联用机器学习进行脂谱分析鉴别大肠埃希菌、福氏志贺菌和宋内志贺菌。

Microbiologyopen. 2022 Aug;11(4):e1313. doi: 10.1002/mbo3.1313.

Using MALDI-TOF spectra in epidemiological surveillance for the detection of bacterial subgroups with a possible epidemic potential.利用 MALDI-TOF 光谱进行流行病学监测，以检测可能具有流行潜力的细菌亚群。

BMC Infect Dis. 2021 Oct 28;21(1):1109. doi: 10.1186/s12879-021-06803-3.

Gingival Crevicular Fluid Peptidome Profiling in Healthy and in Periodontal Diseases.健康和牙周病患者龈沟液肽组学分析。

Int J Mol Sci. 2020 Jul 24;21(15):5270. doi: 10.3390/ijms21155270.

Rapid diagnosis of periodontitis, a feasibility study using MALDI-TOF mass spectrometry.牙周炎的快速诊断，基质辅助激光解吸电离飞行时间质谱的可行性研究。

PLoS One. 2020 Mar 13;15(3):e0230334. doi: 10.1371/journal.pone.0230334. eCollection 2020.

MALDI-TOF mass spectrometry on intact bacteria combined with a refined analysis framework allows accurate classification of MSSA and MRSA.基质辅助激光解吸电离飞行时间质谱法（MALDI-TOF MS）对完整细菌进行检测，并结合改良分析框架，可实现 MSSA 和 MRSA 的准确分类。

PLoS One. 2019 Jun 27;14(6):e0218951. doi: 10.1371/journal.pone.0218951. eCollection 2019.

Spatio-temporal flowering patterns in Mediterranean Poaceae. A community study in SW Spain.地中海禾本科植物的时空开花格局。西班牙西南部的群落研究。

Int J Biometeorol. 2018 Apr;62(4):513-523. doi: 10.1007/s00484-017-1461-7. Epub 2017 Oct 7.

Using MALDI-TOF MS typing method to decipher outbreak: the case of Staphylococcus saprophyticus causing urinary tract infections (UTIs) in Marseille, France.利用 MALDI-TOF MS 分型方法解析暴发疫情：法国马赛以腐生葡萄球菌引起的尿路感染（UTIs）为例。

Eur J Clin Microbiol Infect Dis. 2017 Dec;36(12):2371-2377. doi: 10.1007/s10096-017-3069-6. Epub 2017 Aug 22.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于二元判别分析的质谱数据差异蛋白质表达和峰选择。

Differential protein expression and peak selection in mass spectrometry data by binary discriminant analysis.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

CONTACT

动机

结果

可用性和实施

联系方式

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献