一种基于动态小波的串联质谱数据预处理算法。

A dynamic wavelet-based algorithm for pre-processing tandem mass spectrometry data.

机构信息

School of Mathematics and Statistics, University of Sydney, Sydney, Australia.

出版信息

Bioinformatics. 2010 Sep 15;26(18):2242-9. doi: 10.1093/bioinformatics/btq403. Epub 2010 Jul 13.

DOI:10.1093/bioinformatics/btq403

PMID:20628072

Abstract

MOTIVATION

Mass spectrometry (MS)-based proteomics is one of the most commonly used research techniques for identifying and characterizing proteins in biological and medical research. The identification of a protein is the critical first step in elucidating its biological function. Successful protein identification depends on various interrelated factors, including effective analysis of MS data generated in a proteomic experiment. This analysis comprises several stages, often combined in a pipeline or workflow. The first component of the analysis is known as spectra pre-processing. In this component, the raw data generated by the mass spectrometer is processed to eliminate noise and identify the mass-to-charge ratio (m/z) and intensity for the peaks in the spectrum corresponding to the presence of certain peptides or peptide fragments. Since all downstream analyses depend on the pre-processed data, effective pre-processing is critical to protein identification and characterization. There is a critical need for more robust pre-processing algorithms that perform well on tandem mass spectra under a variety of different conditions and can be easily integrated into sophisticated data analysis pipelines for practical wet-lab applications.

RESULT

We have developed a new pre-processing algorithm. Based on wavelet theory, our method uses a dynamic peak model to identify peaks. It is designed to be easily integrated into a complete proteomic analysis workflow. We compared the method with other available algorithms using a reference library of raw MS and tandem MS spectra with known protein composition information. Our pre-processing algorithm results in the identification of significantly more peptides and proteins in the downstream analysis for a given false discovery rate.

AVAILABILITY

Software available at: http://www.maths.usyd.edu.au/u/penghao/index.html.

摘要

动机

基于质谱（MS）的蛋白质组学是生物和医学研究中用于鉴定和描述蛋白质的最常用研究技术之一。蛋白质的鉴定是阐明其生物学功能的关键第一步。成功的蛋白质鉴定取决于各种相互关联的因素，包括对蛋白质组实验中生成的 MS 数据进行有效分析。该分析包括几个阶段，通常组合在一个流水线或工作流程中。分析的第一个组成部分称为谱预处理。在此组件中，对质谱仪生成的原始数据进行处理，以消除噪声并识别谱中对应于某些肽或肽片段存在的质荷比（m/z）和强度峰值。由于所有下游分析都依赖于预处理后的数据，因此有效的预处理对于蛋白质鉴定和描述至关重要。需要更强大的预处理算法，这些算法在各种不同条件下的串联质谱上表现良好，并且可以轻松集成到复杂的数据分析流水线中，以用于实际的湿实验室应用。

结果

我们开发了一种新的预处理算法。基于小波理论，我们的方法使用动态峰模型来识别峰。它旨在轻松集成到完整的蛋白质组学分析工作流程中。我们使用具有已知蛋白质组成信息的原始 MS 和串联 MS 谱参考库，将该方法与其他可用算法进行了比较。对于给定的假发现率，我们的预处理算法可在下游分析中鉴定出明显更多的肽和蛋白质。

可用性

软件可在以下网址获得：http://www.maths.usyd.edu.au/u/penghao/index.html。

相似文献

A dynamic wavelet-based algorithm for pre-processing tandem mass spectrometry data.

Bioinformatics. 2010 Sep 15;26(18):2242-9. doi: 10.1093/bioinformatics/btq403. Epub 2010 Jul 13.

VEMS 3.0: algorithms and computational tools for tandem mass spectrometry based identification of post-translational modifications in proteins.

J Proteome Res. 2005 Nov-Dec;4(6):2338-47. doi: 10.1021/pr050264q.

Proteomic data analysis workflow for discovery of candidate biomarker peaks predictive of clinical outcome for patients with acute myeloid leukemia.

J Proteome Res. 2008 Jun;7(6):2332-41. doi: 10.1021/pr070482e. Epub 2008 May 2.

Isotopic peak intensity ratio based algorithm for determination of isotopic clusters and monoisotopic masses of polypeptides from high-resolution mass spectrometric data.

Anal Chem. 2008 Oct 1;80(19):7294-303. doi: 10.1021/ac800913b. Epub 2008 Aug 28.

Algorithms and tools for analysis and management of mass spectrometry data.

Brief Bioinform. 2008 Mar;9(2):144-55. doi: 10.1093/bib/bbn007. Epub 2008 Mar 20.

Filtering strategies for improving protein identification in high-throughput MS/MS studies.

Proteomics. 2009 Feb;9(4):848-60. doi: 10.1002/pmic.200800517.

Binomial probability distribution model-based protein identification algorithm for tandem mass spectrometry utilizing peak intensity information.

J Proteome Res. 2013 Jan 4;12(1):328-35. doi: 10.1021/pr300781t. Epub 2012 Nov 29.

i-RUBY: a novel software for quantitative analysis of highly accurate shotgun-proteomics liquid chromatography/tandem mass spectrometry data obtained without stable-isotope labeling of proteins.

Rapid Commun Mass Spectrom. 2011 Apr 15;25(7):960-8. doi: 10.1002/rcm.4943. Epub 2011 Mar 14.

Highly accelerated feature detection in proteomics data sets using modern graphics processing units.

Bioinformatics. 2009 Aug 1;25(15):1937-43. doi: 10.1093/bioinformatics/btp294. Epub 2009 May 14.

Fast tandem mass spectra-based protein identification regardless of the number of spectra or potential modifications examined.

Bioinformatics. 2005 May 15;21(10):2177-84. doi: 10.1093/bioinformatics/bti362. Epub 2005 Mar 3.

引用本文的文献

DEIMoS: An Open-Source Tool for Processing High-Dimensional Mass Spectrometry Data.

Anal Chem. 2022 Apr 26;94(16):6130-6138. doi: 10.1021/acs.analchem.1c05017. Epub 2022 Apr 17.

Wavelet-based peak detection and a new charge inference procedure for MS/MS implemented in ProteoWizard's msConvert.

J Proteome Res. 2015 Feb 6;14(2):1299-307. doi: 10.1021/pr500886y. Epub 2014 Dec 2.

A simple method for predicting transmembrane proteins based on wavelet transform.

Int J Biol Sci. 2013;9(1):22-33. doi: 10.7150/ijbs.5371. Epub 2012 Dec 19.

Wavelet-based method for time-domain noise analysis and reduction in a frequency-scan ion trap mass spectrometer.

J Am Soc Mass Spectrom. 2012 Nov;23(11):1855-64. doi: 10.1007/s13361-012-0455-2. Epub 2012 Aug 21.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种基于动态小波的串联质谱数据预处理算法。

A dynamic wavelet-based algorithm for pre-processing tandem mass spectrometry data.

机构信息

出版信息

MOTIVATION

RESULT

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献