Suppr超能文献

MS-REDUCE:一种用于减少大量质谱数据以进行高通量处理的超快速技术。

MS-REDUCE: an ultrafast technique for reduction of big mass spectrometry data for high-throughput processing.

作者信息

Awan Muaaz Gul, Saeed Fahad

机构信息

Department of Electrical and Computer Engineering and.

Department of Electrical and Computer Engineering and Department of Computer Science, Western Michigan University, Kalamazoo, MI 49008, USA.

出版信息

Bioinformatics. 2016 May 15;32(10):1518-26. doi: 10.1093/bioinformatics/btw023. Epub 2016 Jan 21.

Abstract

MOTIVATION

Modern proteomics studies utilize high-throughput mass spectrometers which can produce data at an astonishing rate. These big mass spectrometry (MS) datasets can easily reach peta-scale level creating storage and analytic problems for large-scale systems biology studies. Each spectrum consists of thousands of peaks which have to be processed to deduce the peptide. However, only a small percentage of peaks in a spectrum are useful for peptide deduction as most of the peaks are either noise or not useful for a given spectrum. This redundant processing of non-useful peaks is a bottleneck for streaming high-throughput processing of big MS data. One way to reduce the amount of computation required in a high-throughput environment is to eliminate non-useful peaks. Existing noise removing algorithms are limited in their data-reduction capability and are compute intensive making them unsuitable for big data and high-throughput environments. In this paper we introduce a novel low-complexity technique based on classification, quantization and sampling of MS peaks.

RESULTS

We present a novel data-reductive strategy for analysis of Big MS data. Our algorithm, called MS-REDUCE, is capable of eliminating noisy peaks as well as peaks that do not contribute to peptide deduction before any peptide deduction is attempted. Our experiments have shown up to 100× speed up over existing state of the art noise elimination algorithms while maintaining comparable high quality matches. Using our approach we were able to process a million spectra in just under an hour on a moderate server.

AVAILABILITY AND IMPLEMENTATION

The developed tool and strategy has been made available to wider proteomics and parallel computing community and the code can be found at https://github.com/pcdslab/MSREDUCE CONTACT: : fahad.saeed@wmich.edu

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

现代蛋白质组学研究使用高通量质谱仪,其能够以惊人的速度产生数据。这些大规模质谱(MS)数据集很容易达到PB级规模,给大规模系统生物学研究带来存储和分析问题。每个质谱图由数千个峰组成,必须对这些峰进行处理以推断肽段。然而,质谱图中只有一小部分峰对肽段推断有用,因为大多数峰要么是噪声,要么对给定的质谱图无用。对无用峰的这种冗余处理是大规模MS数据流式高通量处理的瓶颈。在高通量环境中减少所需计算量的一种方法是消除无用峰。现有的噪声去除算法在数据缩减能力方面有限,并且计算量大,使其不适用于大数据和高通量环境。在本文中,我们介绍了一种基于MS峰分类、量化和采样的新型低复杂度技术。

结果

我们提出了一种用于分析大规模MS数据的新型数据缩减策略。我们的算法称为MS-REDUCE,能够在尝试任何肽段推断之前消除噪声峰以及对肽段推断无贡献的峰。我们的实验表明,与现有的最先进噪声消除算法相比,速度提高了100倍,同时保持了相当的高质量匹配。使用我们的方法,我们能够在一台中等服务器上不到一小时的时间内处理一百万个质谱图。

可用性和实现

已将开发的工具和策略提供给更广泛的蛋白质组学和并行计算社区,代码可在https://github.com/pcdslab/MSREDUCE上找到。联系方式:fahad.saeed@wmich.edu

补充信息

补充数据可在《生物信息学》在线获取。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验