Suppr超能文献

一种基于动态小波的串联质谱数据预处理算法。

A dynamic wavelet-based algorithm for pre-processing tandem mass spectrometry data.

机构信息

School of Mathematics and Statistics, University of Sydney, Sydney, Australia.

出版信息

Bioinformatics. 2010 Sep 15;26(18):2242-9. doi: 10.1093/bioinformatics/btq403. Epub 2010 Jul 13.

Abstract

MOTIVATION

Mass spectrometry (MS)-based proteomics is one of the most commonly used research techniques for identifying and characterizing proteins in biological and medical research. The identification of a protein is the critical first step in elucidating its biological function. Successful protein identification depends on various interrelated factors, including effective analysis of MS data generated in a proteomic experiment. This analysis comprises several stages, often combined in a pipeline or workflow. The first component of the analysis is known as spectra pre-processing. In this component, the raw data generated by the mass spectrometer is processed to eliminate noise and identify the mass-to-charge ratio (m/z) and intensity for the peaks in the spectrum corresponding to the presence of certain peptides or peptide fragments. Since all downstream analyses depend on the pre-processed data, effective pre-processing is critical to protein identification and characterization. There is a critical need for more robust pre-processing algorithms that perform well on tandem mass spectra under a variety of different conditions and can be easily integrated into sophisticated data analysis pipelines for practical wet-lab applications.

RESULT

We have developed a new pre-processing algorithm. Based on wavelet theory, our method uses a dynamic peak model to identify peaks. It is designed to be easily integrated into a complete proteomic analysis workflow. We compared the method with other available algorithms using a reference library of raw MS and tandem MS spectra with known protein composition information. Our pre-processing algorithm results in the identification of significantly more peptides and proteins in the downstream analysis for a given false discovery rate.

AVAILABILITY

Software available at: http://www.maths.usyd.edu.au/u/penghao/index.html.

摘要

动机

基于质谱(MS)的蛋白质组学是生物和医学研究中用于鉴定和描述蛋白质的最常用研究技术之一。蛋白质的鉴定是阐明其生物学功能的关键第一步。成功的蛋白质鉴定取决于各种相互关联的因素,包括对蛋白质组实验中生成的 MS 数据进行有效分析。该分析包括几个阶段,通常组合在一个流水线或工作流程中。分析的第一个组成部分称为谱预处理。在此组件中,对质谱仪生成的原始数据进行处理,以消除噪声并识别谱中对应于某些肽或肽片段存在的质荷比(m/z)和强度峰值。由于所有下游分析都依赖于预处理后的数据,因此有效的预处理对于蛋白质鉴定和描述至关重要。需要更强大的预处理算法,这些算法在各种不同条件下的串联质谱上表现良好,并且可以轻松集成到复杂的数据分析流水线中,以用于实际的湿实验室应用。

结果

我们开发了一种新的预处理算法。基于小波理论,我们的方法使用动态峰模型来识别峰。它旨在轻松集成到完整的蛋白质组学分析工作流程中。我们使用具有已知蛋白质组成信息的原始 MS 和串联 MS 谱参考库,将该方法与其他可用算法进行了比较。对于给定的假发现率,我们的预处理算法可在下游分析中鉴定出明显更多的肽和蛋白质。

可用性

软件可在以下网址获得:http://www.maths.usyd.edu.au/u/penghao/index.html。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验