Suppr超能文献

StackZDPD:一种新型质谱数据编码方案,针对速度和压缩比进行了优化。

StackZDPD: a novel encoding scheme for mass spectrometry data optimized for speed and compression ratio.

机构信息

Zhejiang University, Hangzhou, 310058, China.

School of Life Science, Westlake University, Hangzhou, 310023, China.

出版信息

Sci Rep. 2022 Mar 30;12(1):5384. doi: 10.1038/s41598-022-09432-1.

Abstract

As the pervasive, standardized format for interchange and deposition of raw mass spectrometry (MS) proteomics and metabolomics data, text-based mzML is inefficiently utilized on various analysis platforms due to its sheer volume of samples and limited read/write speed. Most research on compression algorithms rarely provides flexible random file reading scheme. Database-developed solution guarantees the efficiency of random file reading, but nevertheless the efforts in compression and third-party software support are insufficient. Under the premise of ensuring the efficiency of decompression, we propose an encoding scheme "Stack-ZDPD" that is optimized for storage of raw MS data, designed for the format "Aird", a computation-oriented format with fast accessing and decoding time, where the core compression algorithm is "ZDPD". Stack-ZDPD reduces the volume of data stored in mzML format by around 80% or more, depending on the data acquisition pattern, and the compression ratio is approximately 30% compared to ZDPD for data generated using Time of Flight technology. Our approach is available on AirdPro, for file conversion and the Java-API Aird-SDK, for data parsing.

摘要

作为一种普遍存在的、标准化的格式,用于交换和存储原始质谱(MS)蛋白质组学和代谢组学数据,基于文本的 mzML 由于其庞大的样本量和有限的读写速度,在各种分析平台上的利用率很低。大多数关于压缩算法的研究很少提供灵活的随机文件读取方案。数据库开发的解决方案保证了随机文件读取的效率,但在压缩方面的努力和对第三方软件的支持仍然不足。在保证解压效率的前提下,我们提出了一种编码方案“Stack-ZDPD”,该方案针对原始 MS 数据的存储进行了优化,设计用于“Air”格式,这是一种面向计算的格式,具有快速访问和解码时间,其核心压缩算法是“ZDPD”。Stack-ZDPD 将 mzML 格式存储的数据量减少了约 80%或更多,具体取决于数据采集模式,与使用飞行时间技术生成的数据相比,压缩比约为 30%。我们的方法可用于 AirdPro 进行文件转换,以及用于数据解析的 Java-API Aird-SDK。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/35c4/8967824/eb1cc1a263b2/41598_2022_9432_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验