Ortiz Sebastian, Stanisic Luka, Rodriguez Boris A, Rampp Markus, Hummer Gerhard, Cossio Pilar
Biophysics of Tropical Diseases, Max Planck Tandem Group, University of Antioquia UdeA, Calle 70 No. 52-21, Medellín, Colombia.
Max Planck Computing and Data Facility, 85748 Garching, Germany.
J Struct Biol X. 2020 Jul 21;4:100032. doi: 10.1016/j.yjsbx.2020.100032. eCollection 2020.
Cryo-electron microscopy (cryo-EM) has revolutionized structural biology by providing 3D density maps of biomolecules at near-atomic resolution. However, map validation is still an open issue. Despite several efforts from the community, it is possible to overfit 3D maps to noisy data. Here, we develop a novel methodology that uses a small independent particle set (not used during the 3D refinement) to validate the maps. The main idea is to monitor how the map probability evolves over the control set during the 3D refinement. The method is complementary to the gold-standard procedure, which generates two reconstructions at each iteration. We low-pass filter the two reconstructions for different frequency cutoffs, and we calculate the probability of each filtered map given the control set. For high-quality maps, the probability should increase as a function of the frequency cutoff and the refinement iteration. We also compute the similarity between the densities of probability distributions of the two reconstructions. As higher frequencies are included, the distributions become more dissimilar. We optimized the BioEM package to perform these calculations, and tested it over systems ranging from quality data to pure noise. Our results show that with our methodology, it possible to discriminate datasets that are constructed from noise particles. We conclude that validation against a control particle set provides a powerful tool to assess the quality of cryo-EM maps.
冷冻电子显微镜(cryo-EM)通过提供近原子分辨率的生物分子三维密度图,给结构生物学带来了变革。然而,图谱验证仍然是一个悬而未决的问题。尽管学界做出了诸多努力,但仍有可能使三维图谱过度拟合噪声数据。在此,我们开发了一种新颖的方法,该方法使用一个小型独立粒子集(在三维精修过程中未使用)来验证图谱。其主要思路是在三维精修过程中监测图谱概率在控制集上的演变情况。该方法是对金标准程序的补充,金标准程序在每次迭代时会生成两个重建结果。我们对两个重建结果进行不同频率截止的低通滤波,并计算给定控制集时每个滤波后图谱的概率。对于高质量图谱,概率应随频率截止和精修迭代而增加。我们还计算了两个重建结果概率分布密度之间的相似度。随着包含更高频率,分布变得更加不同。我们对BioEM软件包进行了优化以执行这些计算,并在从高质量数据到纯噪声的各种系统上进行了测试。我们的结果表明,使用我们的方法能够区分由噪声粒子构建的数据集。我们得出结论,针对控制粒子集进行验证为评估冷冻电子显微镜图谱的质量提供了一个强大的工具。