Department of Computing, Xiangtan University, Xiangtan, China.
BMC Bioinformatics. 2024 Feb 20;25(1):77. doi: 10.1186/s12859-023-05565-w.
Cryo-electron microscopy (Cryo-EM) plays an increasingly important role in the determination of the three-dimensional (3D) structure of macromolecules. In order to achieve 3D reconstruction results close to atomic resolution, 2D single-particle image classification is not only conducive to single-particle selection, but also a key step that affects 3D reconstruction. The main task is to cluster and align 2D single-grain images into non-heterogeneous groups to obtain sharper single-grain images by averaging calculations. The main difficulties are that the cryo-EM single-particle image has a low signal-to-noise ratio (SNR), cannot manually label the data, and the projection direction is random and the distribution is unknown. Therefore, in the low SNR scenario, how to obtain the characteristic information of the effective particles, improve the clustering accuracy, and thus improve the reconstruction accuracy, is a key problem in the 2D image analysis of single particles of cryo-EM.
Aiming at the above problems, we propose a learnable deep clustering method and a fast alignment weighted averaging method based on frequency domain space to effectively improve the class averaging results and improve the reconstruction accuracy. In particular, it is very prominent in the feature extraction and dimensionality reduction module. Compared with the classification method based on Bayesian and great likelihood, a large amount of single particle data is required to estimate the relative angle orientation of macromolecular single particles in the 3D structure, and we propose that the clustering method shows good results.
SimcryoCluster can use the contrastive learning method to perform well in the unlabeled high-noise cryo-EM single particle image classification task, making it an important tool for cryo-EM protein structure determination.
低温电子显微镜(Cryo-EM)在确定大分子的三维(3D)结构方面发挥着越来越重要的作用。为了实现接近原子分辨率的 3D 重建结果,2D 单颗粒图像分类不仅有利于单颗粒选择,而且是影响 3D 重建的关键步骤。主要任务是将 2D 单颗粒图像聚类和对齐到非同质组中,通过平均计算获得更清晰的单颗粒图像。主要难点在于 cryo-EM 单颗粒图像的信噪比(SNR)低,无法手动标记数据,并且投影方向是随机的,分布未知。因此,在低 SNR 情况下,如何获取有效颗粒的特征信息,提高聚类精度,从而提高重建精度,是 cryo-EM 单颗粒 2D 图像分析中的关键问题。
针对上述问题,我们提出了一种基于频域空间的可学习深度聚类方法和快速对齐加权平均方法,可有效提高类平均结果,提高重建精度。特别是在特征提取和降维模块中效果非常显著。与基于贝叶斯和极大似然的分类方法相比,需要大量的单颗粒数据来估计 3D 结构中大分子单颗粒的相对角度取向,而我们提出的聚类方法则显示出良好的效果。
SimcryoCluster 可以使用对比学习方法在未标记的高噪声 cryo-EM 单颗粒图像分类任务中表现出色,使其成为 cryo-EM 蛋白质结构确定的重要工具。