Suppr超能文献

存储精度损失有多少:无靶向代谢组学质谱数据分析的近无损压缩指南。

How Much Storage Precision Can Be Lost: Guidance for Near-Lossless Compression of Untargeted Metabolomics Mass Spectrometry Data.

机构信息

Central Hospital Affiliated to Shandong First Medical University, Jinan 250000, Shandong, China.

Key Laboratory of Tropical Medicinal Plant Chemistry of Ministry of Education, College of Chemistry and Chemical Engineering, Hainan Normal University, Haikou 571158, Hainan, China.

出版信息

J Proteome Res. 2024 May 3;23(5):1702-1712. doi: 10.1021/acs.jproteome.3c00851. Epub 2024 Apr 19.

Abstract

Several lossy compressors have achieved superior compression rates for mass spectrometry (MS) data at the cost of storage precision. Currently, the impacts of precision losses on MS data processing have not been thoroughly evaluated, which is critical for the future development of lossy compressors. We first evaluated different storage precision (32 bit and 64 bit) in lossless mzML files. We then applied 10 truncation transformations to generate precision-lossy files: five relative errors for intensities and five absolute errors for / values. MZmine3 and XCMS were used for feature detection and GNPS for compound annotation. Lastly, we compared , , 1 - , and file sizes between lossy files and lossless files under different conditions. Overall, we revealed that the discrepancy between 32 and 64 bit precision was under 1%. We proposed an absolute / error of 10 and a relative intensity error of 2 × 10, adhering to a 5% error threshold (1 - above 95%). For a stricter 1% error threshold (1 - above 99%), an absolute / error of 2 × 10 and a relative intensity error of 2 × 10 were advised. This guidance aims to help researchers improve lossy compression algorithms and minimize the negative effects of precision losses on downstream data processing.

摘要

几种有损压缩器在牺牲存储精度的情况下,实现了质谱(MS)数据的卓越压缩率。目前,精度损失对 MS 数据处理的影响尚未得到彻底评估,这对未来有损压缩器的发展至关重要。我们首先评估了无损 mzML 文件中的不同存储精度(32 位和 64 位)。然后,我们应用了 10 种截断变换来生成精度有损文件:强度的五个相对误差和/值的五个绝对误差。MZmine3 和 XCMS 用于特征检测,GNPS 用于化合物注释。最后,我们比较了不同条件下有损文件和无损文件之间的、、1 - 和文件大小。总体而言,我们发现 32 位和 64 位精度之间的差异小于 1%。我们提出了一个绝对/误差为 10 和一个相对强度误差为 2×10,遵循 5%的误差阈值(1 - 大于 95%)。对于更严格的 1%误差阈值(1 - 大于 99%),建议使用绝对/误差为 2×10 和相对强度误差为 2×10。该指南旨在帮助研究人员改进有损压缩算法,并最大程度地减少精度损失对下游数据处理的负面影响。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验