Biocomputing Unit, Centro Nacional de Biotecnologia (CNB-CSIC), Calle Darwin 3, 28049 Cantoblanco, Madrid, Spain.
Masaryk University, Brno, Czech Republic.
Acta Crystallogr D Struct Biol. 2022 Apr 1;78(Pt 4):410-423. doi: 10.1107/S2059798322001978. Epub 2022 Mar 16.
Cryo-electron microscopy (cryoEM) has become a well established technique to elucidate the 3D structures of biological macromolecules. Projection images from thousands of macromolecules that are assumed to be structurally identical are combined into a single 3D map representing the Coulomb potential of the macromolecule under study. This article discusses possible caveats along the image-processing path and how to avoid them to obtain a reliable 3D structure. Some of these problems are very well known in the community. These may be referred to as sample-related (such as specimen denaturation at interfaces or non-uniform projection geometry leading to underrepresented projection directions). The rest are related to the algorithms used. While some have been discussed in depth in the literature, such as the use of an incorrect initial volume, others have received much less attention. However, they are fundamental in any data-analysis approach. Chiefly among them, instabilities in estimating many of the key parameters that are required for a correct 3D reconstruction that occur all along the processing workflow are referred to, which may significantly affect the reliability of the whole process. In the field, the term overfitting has been coined to refer to some particular kinds of artifacts. It is argued that overfitting is a statistical bias in key parameter-estimation steps in the 3D reconstruction process, including intrinsic algorithmic bias. It is also shown that common tools (Fourier shell correlation) and strategies (gold standard) that are normally used to detect or prevent overfitting do not fully protect against it. Alternatively, it is proposed that detecting the bias that leads to overfitting is much easier when addressed at the level of parameter estimation, rather than detecting it once the particle images have been combined into a 3D map. Comparing the results from multiple algorithms (or at least, independent executions of the same algorithm) can detect parameter bias. These multiple executions could then be averaged to give a lower variance estimate of the underlying parameters.
冷冻电子显微镜(cryoEM)已成为一种成熟的技术,可以解析生物大分子的 3D 结构。将数千个假定结构相同的大分子的投影图像组合成一个代表研究中大分子库仑势的单个 3D 图谱。本文讨论了图像处理过程中可能存在的问题,并提供了一些避免这些问题的方法,以获得可靠的 3D 结构。其中一些问题在该领域是众所周知的。这些问题可能与样本有关(例如,界面处的标本变性或不均匀的投影几何形状导致代表性不足的投影方向)。其余的与所使用的算法有关。虽然有些问题在文献中有深入讨论,例如使用不正确的初始体积,其他问题则受到的关注较少。然而,它们是任何数据分析方法的基础。其中主要包括在整个处理工作流程中都需要估计许多关键参数,这些参数的不稳定性会显著影响整个过程的可靠性。在该领域中,术语“过拟合”已被用来指代某些特定类型的伪影。有人认为,过拟合是 3D 重建过程中关键参数估计步骤中的一种统计偏差,包括固有算法偏差。还表明,通常用于检测或防止过拟合的常用工具(傅里叶壳相关)和策略(黄金标准)并不能完全防止过拟合。相反,当在参数估计级别上解决导致过拟合的偏差时,更容易检测到过拟合。比较来自多个算法的结果(或者至少是相同算法的独立执行)可以检测参数偏差。然后可以对这些多次执行进行平均,以获得基础参数的较低方差估计。