Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, United States.
NextGen Precision Health, University of Missouri, Columbia, MO 65211, United States.
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae282.
Picking protein particles in cryo-electron microscopy (cryo-EM) micrographs is a crucial step in the cryo-EM-based structure determination. However, existing methods trained on a limited amount of cryo-EM data still cannot accurately pick protein particles from noisy cryo-EM images. The general foundational artificial intelligence-based image segmentation model such as Meta's Segment Anything Model (SAM) cannot segment protein particles well because their training data do not include cryo-EM images. Here, we present a novel approach (CryoSegNet) of integrating an attention-gated U-shape network (U-Net) specially designed and trained for cryo-EM particle picking and the SAM. The U-Net is first trained on a large cryo-EM image dataset and then used to generate input from original cryo-EM images for SAM to make particle pickings. CryoSegNet shows both high precision and recall in segmenting protein particles from cryo-EM micrographs, irrespective of protein type, shape and size. On several independent datasets of various protein types, CryoSegNet outperforms two top machine learning particle pickers crYOLO and Topaz as well as SAM itself. The average resolution of density maps reconstructed from the particles picked by CryoSegNet is 3.33 Å, 7% better than 3.58 Å of Topaz and 14% better than 3.87 Å of crYOLO. It is publicly available at https://github.com/jianlin-cheng/CryoSegNet.
在冷冻电子显微镜(cryo-EM)显微图像中挑选蛋白质颗粒是基于 cryo-EM 的结构确定的关键步骤。然而,现有的基于少量 cryo-EM 数据训练的方法仍然不能从有噪声的 cryo-EM 图像中准确地挑选蛋白质颗粒。像 Meta 的 Segment Anything Model(SAM)这样的基于通用人工智能的基本图像分割模型,由于其训练数据不包括 cryo-EM 图像,因此无法很好地分割蛋白质颗粒。在这里,我们提出了一种新的方法(CryoSegNet),即将专门为 cryo-EM 颗粒挑选而设计和训练的带有注意力门的 U 形网络(U-Net)与 SAM 集成。U-Net 首先在大型 cryo-EM 图像数据集上进行训练,然后用于生成原始 cryo-EM 图像的输入,以便 SAM 进行颗粒挑选。CryoSegNet 在从 cryo-EM 显微图像中分割蛋白质颗粒方面表现出了较高的精度和召回率,无论蛋白质类型、形状和大小如何。在几个不同蛋白质类型的独立数据集上,CryoSegNet 的表现优于两个顶级机器学习颗粒挑选器 crYOLO 和 Topaz 以及 SAM 本身。由 CryoSegNet 挑选的颗粒重建的密度图的平均分辨率为 3.33 Å,比 Topaz 的 3.58 Å 好 7%,比 crYOLO 的 3.87 Å 好 14%。它可在 https://github.com/jianlin-cheng/CryoSegNet 上公开获取。