MS-REDUCE：一种用于减少大量质谱数据以进行高通量处理的超快速技术。

MS-REDUCE: an ultrafast technique for reduction of big mass spectrometry data for high-throughput processing.

作者信息

Awan Muaaz Gul, Saeed Fahad

机构信息

Department of Electrical and Computer Engineering and.

Department of Electrical and Computer Engineering and Department of Computer Science, Western Michigan University, Kalamazoo, MI 49008, USA.

出版信息

Bioinformatics. 2016 May 15;32(10):1518-26. doi: 10.1093/bioinformatics/btw023. Epub 2016 Jan 21.

DOI:10.1093/bioinformatics/btw023

PMID:26801958

Abstract

MOTIVATION

Modern proteomics studies utilize high-throughput mass spectrometers which can produce data at an astonishing rate. These big mass spectrometry (MS) datasets can easily reach peta-scale level creating storage and analytic problems for large-scale systems biology studies. Each spectrum consists of thousands of peaks which have to be processed to deduce the peptide. However, only a small percentage of peaks in a spectrum are useful for peptide deduction as most of the peaks are either noise or not useful for a given spectrum. This redundant processing of non-useful peaks is a bottleneck for streaming high-throughput processing of big MS data. One way to reduce the amount of computation required in a high-throughput environment is to eliminate non-useful peaks. Existing noise removing algorithms are limited in their data-reduction capability and are compute intensive making them unsuitable for big data and high-throughput environments. In this paper we introduce a novel low-complexity technique based on classification, quantization and sampling of MS peaks.

RESULTS

We present a novel data-reductive strategy for analysis of Big MS data. Our algorithm, called MS-REDUCE, is capable of eliminating noisy peaks as well as peaks that do not contribute to peptide deduction before any peptide deduction is attempted. Our experiments have shown up to 100× speed up over existing state of the art noise elimination algorithms while maintaining comparable high quality matches. Using our approach we were able to process a million spectra in just under an hour on a moderate server.

AVAILABILITY AND IMPLEMENTATION

The developed tool and strategy has been made available to wider proteomics and parallel computing community and the code can be found at https://github.com/pcdslab/MSREDUCE CONTACT: : fahad.saeed@wmich.edu

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

现代蛋白质组学研究使用高通量质谱仪，其能够以惊人的速度产生数据。这些大规模质谱（MS）数据集很容易达到PB级规模，给大规模系统生物学研究带来存储和分析问题。每个质谱图由数千个峰组成，必须对这些峰进行处理以推断肽段。然而，质谱图中只有一小部分峰对肽段推断有用，因为大多数峰要么是噪声，要么对给定的质谱图无用。对无用峰的这种冗余处理是大规模MS数据流式高通量处理的瓶颈。在高通量环境中减少所需计算量的一种方法是消除无用峰。现有的噪声去除算法在数据缩减能力方面有限，并且计算量大，使其不适用于大数据和高通量环境。在本文中，我们介绍了一种基于MS峰分类、量化和采样的新型低复杂度技术。

结果

我们提出了一种用于分析大规模MS数据的新型数据缩减策略。我们的算法称为MS-REDUCE，能够在尝试任何肽段推断之前消除噪声峰以及对肽段推断无贡献的峰。我们的实验表明，与现有的最先进噪声消除算法相比，速度提高了100倍，同时保持了相当的高质量匹配。使用我们的方法，我们能够在一台中等服务器上不到一小时的时间内处理一百万个质谱图。

可用性和实现

已将开发的工具和策略提供给更广泛的蛋白质组学和并行计算社区，代码可在https://github.com/pcdslab/MSREDUCE上找到。联系方式：fahad.saeed@wmich.edu

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

MS-REDUCE: an ultrafast technique for reduction of big mass spectrometry data for high-throughput processing.MS-REDUCE：一种用于减少大量质谱数据以进行高通量处理的超快速技术。

Bioinformatics. 2016 May 15;32(10):1518-26. doi: 10.1093/bioinformatics/btw023. Epub 2016 Jan 21.

An Out-of-Core GPU based dimensionality reduction algorithm for Big Mass Spectrometry Data and its application in bottom-up Proteomics.一种基于外核GPU的用于海量质谱数据的降维算法及其在自下而上蛋白质组学中的应用。

ACM BCB. 2017 Aug;2017:550-555. doi: 10.1145/3107411.3107466.

Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学：基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍

MCtandem: an efficient tool for large-scale peptide identification on many integrated core (MIC) architecture.MCtandem：一种在许多集成核心 (MIC) 架构上进行大规模肽鉴定的高效工具。

BMC Bioinformatics. 2019 Jul 17;20(1):397. doi: 10.1186/s12859-019-2980-5.

SWPepNovo: An Efficient De Novo Peptide Sequencing Tool for Large-scale MS/MS Spectra Analysis.SWPepNovo：一种用于大规模 MS/MS 谱分析的高效从头肽测序工具。

Int J Biol Sci. 2019 Jul 3;15(9):1787-1801. doi: 10.7150/ijbs.32142. eCollection 2019.

A dynamic wavelet-based algorithm for pre-processing tandem mass spectrometry data.一种基于动态小波的串联质谱数据预处理算法。

Bioinformatics. 2010 Sep 15;26(18):2242-9. doi: 10.1093/bioinformatics/btq403. Epub 2010 Jul 13.

Chemical rule-based filtering of MS/MS spectra.基于化学规则的 MS/MS 光谱过滤。

Bioinformatics. 2013 Apr 1;29(7):925-32. doi: 10.1093/bioinformatics/btt061. Epub 2013 Feb 15.

Feature selection and classification of noisy proteomics mass spectrometry data based on one-bit perturbed compressed sensing.基于一位扰动压缩感知的噪声蛋白质组学质谱数据特征选择与分类。

Bioinformatics. 2020 Aug 15;36(16):4423-4431. doi: 10.1093/bioinformatics/btaa516.

Data reduction of isotope-resolved LC-MS spectra.同位素分辨液相色谱-质谱光谱的数据缩减

Bioinformatics. 2007 Jun 1;23(11):1394-400. doi: 10.1093/bioinformatics/btm083. Epub 2007 May 11.

LC-MSsim--a simulation software for liquid chromatography mass spectrometry data.LC-MSsim——一款用于液相色谱质谱数据的模拟软件。

BMC Bioinformatics. 2008 Oct 8;9:423. doi: 10.1186/1471-2105-9-423.

引用本文的文献

Denoising Search doubles the number of metabolite and exposome annotations in human plasma using an Orbitrap Astral mass spectrometer.去噪搜索使用轨道阱星体质谱仪使人类血浆中代谢物和暴露组注释的数量增加了一倍。

Nat Methods. 2025 May;22(5):1008-1016. doi: 10.1038/s41592-025-02646-x. Epub 2025 Mar 28.

doubles the number of metabolite and exposome annotations in human plasma using an Orbitrap Astral mass spectrometer.使用一台轨道阱星质谱仪将人体血浆中代谢物和暴露组注释的数量增加了一倍。

Res Sq. 2024 Jul 25:rs.3.rs-4758843. doi: 10.21203/rs.3.rs-4758843/v1.

GPU-acceleration of the distributed-memory database peptide search of mass spectrometry data.GPU 加速质谱数据分布式内存数据库肽搜索。

Sci Rep. 2023 Oct 31;13(1):18713. doi: 10.1038/s41598-023-43033-w.

Benchmarking mass spectrometry based proteomics algorithms using a simulated database.使用模拟数据库对基于质谱的蛋白质组学算法进行基准测试。

Netw Model Anal Health Inform Bioinform. 2021;10. doi: 10.1007/s13721-021-00298-3. Epub 2021 Mar 26.

Methods for Proteogenomics Data Analysis, Challenges, and Scalability Bottlenecks: A Survey.蛋白质基因组学数据分析方法、挑战及可扩展性瓶颈：一项综述。

IEEE Access. 2021;9:5497-5516. doi: 10.1109/ACCESS.2020.3047588. Epub 2020 Dec 25.

Exploring novel secondary metabolites from natural products using pre-processed mass spectral data.利用预处理的质谱数据探索天然产物中的新型次生代谢产物。

Sci Rep. 2019 Nov 22;9(1):17430. doi: 10.1038/s41598-019-54078-1.

Bolt: a New Age Peptide Search Engine for Comprehensive MS/MS Sequencing Through Vast Protein Databases in Minutes.Bolt：一种新型的肽搜索引擎，可在数分钟内通过庞大的蛋白质数据库对 MS/MS 测序进行全面分析。

J Am Soc Mass Spectrom. 2019 Nov;30(11):2408-2418. doi: 10.1007/s13361-019-02306-3. Epub 2019 Aug 26.

MaSS-Simulator: A Highly Configurable Simulator for Generating MS/MS Datasets for Benchmarking of Proteomics Algorithms.MaSS-Simulator：一个高度可配置的用于生成 MS/MS 数据集的模拟器，用于对蛋白质组学算法进行基准测试。

Proteomics. 2018 Oct;18(20):e1800206. doi: 10.1002/pmic.201800206. Epub 2018 Sep 28.

GPU-DAEMON: GPU algorithm design, data management & optimization template for array based big omics data.GPU-DAEMON：基于数组的大型组学数据的 GPU 算法设计、数据管理和优化模板。

Comput Biol Med. 2018 Oct 1;101:163-173. doi: 10.1016/j.compbiomed.2018.08.015. Epub 2018 Aug 16.

ACM BCB. 2017 Aug;2017:550-555. doi: 10.1145/3107411.3107466.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

MS-REDUCE：一种用于减少大量质谱数据以进行高通量处理的超快速技术。

MS-REDUCE: an ultrafast technique for reduction of big mass spectrometry data for high-throughput processing.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献