Liu Jing, van der Schot Gijs, Engblom Stefan
Opt Express. 2019 Feb 18;27(4):3884-3899. doi: 10.1364/OE.27.003884.
Current Flash X-ray single-particle diffraction Imaging (FXI) experiments, which operate on modern X-ray Free Electron Lasers (XFELs), can record millions of interpretable diffraction patterns from individual biomolecules per day. Due to the practical limitations with the FXI technology, those patterns will to a varying degree include scatterings from contaminated samples. Also, the heterogeneity of the sample biomolecules is unavoidable and complicates data processing. Reducing the data volumes and selecting high-quality single-molecule patterns are therefore critical steps in the experimental setup. In this paper, we present two supervised template-based learning methods for classifying FXI patterns. Our Eigen-Image and Log-Likelihood classifier can find the best-matched template for a single-molecule pattern within a few milliseconds. It is also straightforward to parallelize them so as to match the XFEL repetition rate fully, thereby enabling processing at site. The methods perform in a stable way on various kinds of synthetic data. As a practical example we tested our methods on a real mimivirus dataset, obtaining a convincing classification accuracy of 0.9.
当前在现代X射线自由电子激光(XFEL)上运行的闪光X射线单粒子衍射成像(FXI)实验,每天可以从单个生物分子记录数百万个可解释的衍射图案。由于FXI技术存在实际限制,这些图案在不同程度上会包括来自受污染样品的散射。此外,样品生物分子的异质性不可避免,这使得数据处理变得复杂。因此,减少数据量并选择高质量的单分子图案是实验设置中的关键步骤。在本文中,我们提出了两种基于监督模板的学习方法来对FXI图案进行分类。我们的特征图像和对数似然分类器可以在几毫秒内为单分子图案找到最佳匹配模板。将它们并行化也很简单,以便完全匹配XFEL重复率,从而实现现场处理。这些方法在各种合成数据上都能稳定运行。作为一个实际例子,我们在一个真实的拟菌病毒数据集上测试了我们的方法,获得了令人信服的0.9的分类准确率。