Völter Constantin, Starostin Vladimir, Lapkin Dmitry, Munteanu Valentin, Romodin Mikhail, Hylinski Maik, Gerlach Alexander, Hinderhofer Alexander, Schreiber Frank
Institute of Applied Physics - University of Tübingen Auf der Morgenstelle 10 72076Tübingen Germany.
Cluster of Excellence 'Machine learning - new perspectives for science'University of Tübingen Maria-von-Linden-Straße 6 72076Tübingen Germany.
J Appl Crystallogr. 2025 Feb 28;58(Pt 2):513-522. doi: 10.1107/S1600576725000974. eCollection 2025 Apr 1.
Recent advancements in X-ray sources and detectors have dramatically increased data generation, leading to a greater demand for automated data processing. This is particularly relevant for real-time grazing-incidence wide-angle X-ray scattering (GIWAXS) experiments which can produce hundreds of thousands of diffraction images in a single day at a synchrotron beamline. Deep learning (DL)-based peak-detection techniques are becoming prominent in this field, but rigorous benchmarking is essential to evaluate their reliability, identify potential problems, explore avenues for improvement and build confidence among researchers for seamless integration into their workflows. However, the systematic evaluation of these techniques has been hampered by the lack of annotated GIWAXS datasets, standardized metrics and baseline models. To address these challenges, we introduce a comprehensive framework comprising an annotated experimental dataset, physics-informed metrics adapted to the GIWAXS geometry and a competitive baseline - a classical, non-DL peak-detection algorithm optimized on our dataset. Furthermore, we apply our framework to benchmark a recent DL solution trained on simulated data and discover its superior performance compared with our baseline. This analysis not only highlights the effectiveness of DL methods for identifying diffraction peaks but also provides insights for further development of these solutions.
X射线源和探测器的最新进展极大地增加了数据生成量,导致对自动化数据处理的需求更大。这对于实时掠入射广角X射线散射(GIWAXS)实验尤为重要,该实验在同步加速器光束线上一天内可产生数十万张衍射图像。基于深度学习(DL)的峰值检测技术在该领域正变得日益突出,但严格的基准测试对于评估其可靠性、识别潜在问题、探索改进途径以及在研究人员中建立信心以无缝集成到他们的工作流程中至关重要。然而,由于缺乏带注释的GIWAXS数据集、标准化指标和基线模型,这些技术的系统评估受到了阻碍。为应对这些挑战,我们引入了一个综合框架,该框架包括一个带注释的实验数据集、适用于GIWAXS几何结构的物理信息指标以及一个具有竞争力的基线——一种在我们的数据集上优化的经典非DL峰值检测算法。此外,我们应用我们的框架对最近在模拟数据上训练的DL解决方案进行基准测试,并发现其与我们的基线相比具有卓越的性能。该分析不仅突出了DL方法在识别衍射峰方面的有效性,还为这些解决方案的进一步发展提供了见解。