Suppr超能文献

蛋白质组学质谱数据的数值压缩方案。

Numerical compression schemes for proteomics mass spectrometry data.

作者信息

Teleman Johan, Dowsey Andrew W, Gonzalez-Galarza Faviel F, Perkins Simon, Pratt Brian, Röst Hannes L, Malmström Lars, Malmström Johan, Jones Andrew R, Deutsch Eric W, Levander Fredrik

机构信息

From the ‡Department of Immunotechnology, Lund University, Medicon Village building 406, 223 81 Lund Sweden;

§Institute of Human Development, Faculty of Medical and Human Sciences, University of Manchester, United Kingdom; ¶Centre for Advanced Discovery and Experimental Therapeutics (CADET), University of Manchester and Central Manchester University Hospitals NHS Foundation Trust, Manchester Academic Health Sciences Centre, Oxford Road, Manchester M13 9WL, United Kingdom;

出版信息

Mol Cell Proteomics. 2014 Jun;13(6):1537-42. doi: 10.1074/mcp.O114.037879. Epub 2014 Mar 27.

Abstract

The open XML format mzML, used for representation of MS data, is pivotal for the development of platform-independent MS analysis software. Although conversion from vendor formats to mzML must take place on a platform on which the vendor libraries are available (i.e. Windows), once mzML files have been generated, they can be used on any platform. However, the mzML format has turned out to be less efficient than vendor formats. In many cases, the naïve mzML representation is fourfold or even up to 18-fold larger compared with the original vendor file. In disk I/O limited setups, a larger data file also leads to longer processing times, which is a problem given the data production rates of modern mass spectrometers. In an attempt to reduce this problem, we here present a family of numerical compression algorithms called MS-Numpress, intended for efficient compression of MS data. To facilitate ease of adoption, the algorithms target the binary data in the mzML standard, and support in main proteomics tools is already available. Using a test set of 10 representative MS data files we demonstrate typical file size decreases of 90% when combined with traditional compression, as well as read time decreases of up to 50%. It is envisaged that these improvements will be beneficial for data handling within the MS community.

摘要

用于表示质谱数据的开放XML格式mzML,对于独立于平台的质谱分析软件的开发至关重要。虽然从供应商格式转换为mzML必须在可使用供应商库的平台(即Windows)上进行,但一旦生成了mzML文件,它们就可以在任何平台上使用。然而,事实证明mzML格式的效率低于供应商格式。在许多情况下,与原始供应商文件相比,单纯的mzML表示要大四倍甚至高达18倍。在磁盘I/O受限的设置中,更大的数据文件也会导致处理时间更长,考虑到现代质谱仪的数据产生速率,这是一个问题。为了试图减少这个问题,我们在此提出了一族名为MS-Numpress的数值压缩算法,旨在对质谱数据进行高效压缩。为了便于采用,这些算法针对mzML标准中的二进制数据,并且主要蛋白质组学工具中已经提供了支持。使用一组包含10个代表性质谱数据文件的测试集,我们证明与传统压缩相结合时,典型文件大小可减少90%,读取时间最多可减少50%。预计这些改进将对质谱领域的数据处理有益。

相似文献

1
Numerical compression schemes for proteomics mass spectrometry data.蛋白质组学质谱数据的数值压缩方案。
Mol Cell Proteomics. 2014 Jun;13(6):1537-42. doi: 10.1074/mcp.O114.037879. Epub 2014 Mar 27.
3
Mass spectrometer output file format mzML.质谱仪输出文件格式为mzML。
Methods Mol Biol. 2010;604:319-31. doi: 10.1007/978-1-60761-444-9_22.
5
Fast and Efficient XML Data Access for Next-Generation Mass Spectrometry.面向下一代质谱分析的快速高效XML数据访问
PLoS One. 2015 Apr 30;10(4):e0125108. doi: 10.1371/journal.pone.0125108. eCollection 2015.
9
Comparison of Programmatic Approaches for Efficient Accessing to mzML Files.高效访问mzML文件的编程方法比较
J Data Mining Genomics Proteomics. 2011 Jan 1;2(1). doi: 10.4172/2153-0602.1000109.

引用本文的文献

7
Bacteriophages fEV-1 and fD1 Infect .噬菌体 fEV-1 和 fD1 感染.
Viruses. 2021 Jul 16;13(7):1384. doi: 10.3390/v13071384.

本文引用的文献

2
The one hour yeast proteome.酵母蛋白质组的一个小时。
Mol Cell Proteomics. 2014 Jan;13(1):339-47. doi: 10.1074/mcp.M113.034769. Epub 2013 Oct 19.
7
mz5: space- and time-efficient storage of mass spectrometry data sets.mz5:用于质谱数据集的高效存储。
Mol Cell Proteomics. 2012 Jan;11(1):O111.011379. doi: 10.1074/mcp.O111.011379. Epub 2011 Sep 29.
9
mzML--a community standard for mass spectrometry data.mzML--质谱数据的社区标准。
Mol Cell Proteomics. 2011 Jan;10(1):R110.000133. doi: 10.1074/mcp.R110.000133. Epub 2010 Aug 17.
10
"Lossless" compression of high resolution mass spectra of small molecules.小分子高分辨率质谱的“无损”压缩
Metabolomics. 2010 Sep;6(3):335-340. doi: 10.1007/s11306-010-0202-2. Epub 2010 Mar 7.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验