Suppr超能文献

用于串行晶体学实验的实时数据处理。

Real-time data processing for serial crystallography experiments.

作者信息

White Thomas, Schoof Tim, Yakubov Sergey, Tolstikova Aleksandra, Middendorf Philipp, Karnevskiy Mikhail, Mariani Valerio, Henkel Alessandra, Klopprogge Bjarne, Hannappel Juergen, Oberthuer Dominik, De Gennaro Aquino Ivan, Egorov Dmitry, Munke Anna, Sprenger Janina, Pompidor Guillaume, Taberman Helena, Gruzinov Andrey, Meyer Jan, Hakanpää Johanna, Gasthuber Martin

机构信息

Center for Data and Computing in Natural Science CDCS, Deutsches Elektronen-Synchrotron DESY, Hamburg, Germany.

Deutsches Elektronen-Synchrotron DESY, Hamburg, Germany.

出版信息

IUCrJ. 2025 Jan 1;12(Pt 1):97-108. doi: 10.1107/S2052252524011837.

Abstract

We report the use of streaming data interfaces to perform fully online data processing for serial crystallography experiments, without storing intermediate data on disk. The system produces Bragg reflection intensity measurements suitable for scaling and merging, with a latency of less than 1 s per frame. Our system uses the CrystFEL software in combination with the ASAP::O data framework. In a series of user experiments at PETRA III, frames from a 16 megapixel Dectris EIGER2 X detector were searched for peaks, indexed and integrated at the maximum full-frame readout speed of 133 frames per second. The computational resources required depend on various factors, most significantly the fraction of non-blank frames (hits'). The average single-thread processing time per frame was 242 ms for blank frames and 455 ms for hits, meaning that a single 96-core computing node was sufficient to keep up with the data, with ample headroom for unexpected throughput reductions. Further significant improvements are expected, for example by binning pixel intensities together to reduce the pixel count. We discuss the implications of real-time data processing on the data deluge' problem from recent and future photon-science experiments, in particular on calibration requirements, computing access patterns and the need for the preservation of raw data.

摘要

我们报告了使用流数据接口对串行晶体学实验进行完全在线数据处理,而无需将中间数据存储在磁盘上。该系统生成适用于缩放和合并的布拉格反射强度测量值,每帧延迟小于1秒。我们的系统将CrystFEL软件与ASAP::O数据框架结合使用。在PETRA III进行的一系列用户实验中,以每秒133帧的最大全帧读出速度,对来自1600万像素的Dectris EIGER2 X探测器的帧进行了峰值搜索、索引和积分。所需的计算资源取决于各种因素,最主要的是非空白帧(“命中”)的比例。空白帧每帧的平均单线程处理时间为242毫秒,命中帧为455毫秒,这意味着单个96核计算节点足以跟上数据,并有足够的余量应对意外的吞吐量降低。预计会有进一步显著的改进,例如通过合并像素强度以减少像素数量。我们讨论了实时数据处理对近期和未来光子科学实验中“数据洪流”问题的影响,特别是在校准要求、计算访问模式以及原始数据保存需求方面。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fde/11707691/bd12fce15835/m-12-00097-fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验