Suppr超能文献

蛋白质连续晶体学中的数据缩减。

Data reduction in protein serial crystallography.

机构信息

Center for Free-Electron Laser Science CFEL, Deutsche Elektronen-Synchrotron DESY, Notkestr. 85, 22607 Hamburg, Germany.

Deutsches Elektronen-Synchrotron DESY, Notkestr. 85, 22607 Hamburg, Germany.

出版信息

IUCrJ. 2024 Mar 1;11(Pt 2):190-201. doi: 10.1107/S205225252400054X.

Abstract

Serial crystallography (SX) has become an established technique for protein structure determination, especially when dealing with small or radiation-sensitive crystals and investigating fast or irreversible protein dynamics. The advent of newly developed multi-megapixel X-ray area detectors, capable of capturing over 1000 images per second, has brought about substantial benefits. However, this advancement also entails a notable increase in the volume of collected data. Today, up to 2 PB of data per experiment could be easily obtained under efficient operating conditions. The combined costs associated with storing data from multiple experiments provide a compelling incentive to develop strategies that effectively reduce the amount of data stored on disk while maintaining the quality of scientific outcomes. Lossless data-compression methods are designed to preserve the information content of the data but often struggle to achieve a high compression ratio when applied to experimental data that contain noise. Conversely, lossy compression methods offer the potential to greatly reduce the data volume. Nonetheless, it is vital to thoroughly assess the impact of data quality and scientific outcomes when employing lossy compression, as it inherently involves discarding information. The evaluation of lossy compression effects on data requires proper data quality metrics. In our research, we assess various approaches for both lossless and lossy compression techniques applied to SX data, and equally importantly, we describe metrics suitable for evaluating SX data quality.

摘要

连续结晶学(SX)已成为蛋白质结构测定的一种成熟技术,特别是在处理小或辐射敏感的晶体以及研究快速或不可逆的蛋白质动力学时。新开发的多百万像素 X 射线面探测器的出现带来了实质性的好处,这些探测器每秒能够捕获超过 1000 张图像。然而,这一进步也导致了所收集数据量的显著增加。如今,在高效的操作条件下,每个实验可以轻松获得高达 2 PB 的数据。考虑到来自多个实验的数据存储成本的综合因素,开发有效的策略来减少存储在磁盘上的数据量,同时保持科学结果的质量,这具有很强的吸引力。无损数据压缩方法旨在保留数据的信息内容,但在应用于包含噪声的实验数据时,通常难以实现高压缩比。相反,有损压缩方法具有大大减少数据量的潜力。然而,在使用有损压缩时,必须彻底评估数据质量和科学结果的影响,因为它本质上涉及信息的丢弃。对数据有损压缩效果的评估需要适当的数据质量指标。在我们的研究中,我们评估了应用于 SX 数据的无损和有损压缩技术的各种方法,同样重要的是,我们描述了适用于评估 SX 数据质量的指标。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c44b/10916297/a5e442fd2a05/m-11-00190-fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验