Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, United States.
Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, United States.
J Struct Biol. 2022 Sep;214(3):107875. doi: 10.1016/j.jsb.2022.107875. Epub 2022 Jun 17.
With larger, higher speed detectors and improved automation, individual CryoEM instruments are capable of producing a prodigious amount of data each day, which must then be stored, processed and archived. While it has become routine to use lossless compression on raw counting-mode movies, the averages which result after correcting these movies no longer compress well. These averages could be considered sufficient for long term archival, yet they are conventionally stored with 32 bits of precision, despite high noise levels. Derived images are similarly stored with excess precision, providing an opportunity to decrease project sizes and improve processing speed. We present a simple argument based on propagation of uncertainty for safe bit truncation of flat-fielded images combined with lossless compression. The same method can be used for most derived images throughout the processing pipeline. We test the proposed strategy on two standard, data-limited CryoEM data sets, demonstrating that these limits are safe for real-world use. We find that 5 bits of precision is sufficient for virtually any raw CryoEM data and that 8-12 bits is sufficient for intermediate averages or final 3-D structures. Additionally, we detail and recommend specific rules for discretization of data as well as a practical compressed data representation that is tuned to the specific needs of CryoEM.
随着更大、更快的探测器和改进的自动化技术的出现,单个 CryoEM 仪器每天都能够产生大量的数据,这些数据必须进行存储、处理和归档。虽然在原始计数模式电影上使用无损压缩已经成为常规操作,但在纠正这些电影后得到的平均值仍然不能很好地压缩。这些平均值可以被认为足以进行长期存档,但它们通常以 32 位精度存储,尽管噪声水平很高。衍生图像也以多余的精度存储,为减小项目规模和提高处理速度提供了机会。我们提出了一个简单的基于不确定性传播的论点,用于对平场化图像进行安全的比特截断,并结合无损压缩。这种方法可以用于处理管道中的大多数衍生图像。我们在两个标准的、数据受限的 CryoEM 数据集上测试了所提出的策略,证明这些限制在实际使用中是安全的。我们发现,对于几乎任何原始的 CryoEM 数据,5 位精度就足够了,而对于中间平均值或最终的 3D 结构,8-12 位精度就足够了。此外,我们详细介绍并推荐了数据离散化的具体规则,以及一种针对 CryoEM 特定需求进行优化的实用压缩数据表示。