Bortel G, Faigel G, Tegze M
Research Institute for Solid State Physics and Optics of the Hungarian Academy of Sciences, P.O. Box 49, H-1525 Budapest, Hungary.
J Struct Biol. 2009 May;166(2):226-33. doi: 10.1016/j.jsb.2009.01.005.
Single molecule imaging experiments at future X-ray free electron laser sources will provide large number of random 3D oriented diffraction patterns with low counting statistics. Grouping of this vast amount of data into classes of similar orientations and averaging must be performed before their orientation and structure reconstruction can take place. Classification algorithms performing all-pair pattern comparisons scale badly with the number of patterns in terms of their computing requirements, which presents a problem in case of improving resolution and decreasing signal to noise ratios. We describe an algorithm performing significantly less pattern comparisons and render classification possible in such cases. The invariance of patterns against rotation of the object about the primary beam axis is also exploited to decrease the number of classes and improve the quality of class averages. This work is the first, which demonstrates that it is possible to classify a dataset with realistic target parameters: 10 keV photon energy, 10(12) photons/pulse, 100 x 100 nm2 focusing, 538 kDa protein, 2.4 A resolution, 10(6) patterns, approximately 3 x 10(4) classes, <1 degree misorientation within classes. The effects of molecular symmetry and its consequences on classification are also analyzed.
在未来的X射线自由电子激光源上进行的单分子成像实验将提供大量具有低计数统计量的随机三维取向衍射图案。在进行取向和结构重建之前,必须将这大量的数据分组为相似取向的类别并进行平均。执行全对图案比较的分类算法在计算需求方面随图案数量的增加而扩展性很差,这在提高分辨率和降低信噪比的情况下会带来问题。我们描述了一种执行的图案比较显著更少的算法,并使得在这种情况下进行分类成为可能。还利用了图案相对于物体绕主光束轴旋转的不变性来减少类别数量并提高类别平均的质量。这项工作首次证明了可以对具有现实目标参数的数据集进行分类:10 keV光子能量、10¹²个光子/脉冲、100×100 nm²聚焦、538 kDa蛋白质、2.4 Å分辨率、10⁶个图案、约3×10⁴个类别、类别内取向误差<1度。还分析了分子对称性的影响及其对分类的后果。