Gyawali Rajan, Dhakal Ashwin, Wang Liguo, Cheng Jianlin
Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA.
NextGen Precision Health, University of Missouri, Columbia, MO, 65211, USA.
bioRxiv. 2024 Mar 14:2023.10.02.560572. doi: 10.1101/2023.10.02.560572.
Picking protein particles in cryo-electron microscopy (cryo-EM) micrographs is a crucial step in the cryo-EM-based structure determination. However, existing methods trained on a limited amount of cryo-EM data still cannot accurately pick protein particles from noisy cryo-EM images. The general foundational artificial intelligence (AI)-based image segmentation model such as Meta's Segment Anything Model (SAM) cannot segment protein particles well because their training data do not include cryo-EM images. Here, we present a novel approach (CryoSegNet) of integrating an attention-gated U-shape network (U-Net) specially designed and trained for cryo-EM particle picking and the SAM. The U-Net is first trained on a large cryo-EM image dataset and then used to generate input from original cryo-EM images for SAM to make particle pickings. CryoSegNet shows both high precision and recall in segmenting protein particles from cryo-EM micrographs, irrespective of protein type, shape, and size. On several independent datasets of various protein types, CryoSegNet outperforms two top machine learning particle pickers crYOLO and Topaz as well as SAM itself. The average resolution of density maps reconstructed from the particles picked by CryoSegNet is 3.32 Å, 7% better than 3.57 Å of Topaz and 14% better than 3.85 Å of crYOLO.
在冷冻电子显微镜(cryo-EM)显微照片中挑选蛋白质颗粒是基于冷冻电镜的结构测定中的关键步骤。然而,在有限数量的冷冻电镜数据上训练的现有方法仍然无法从有噪声的冷冻电镜图像中准确挑选蛋白质颗粒。像Meta的Segment Anything Model(SAM)这样基于人工智能(AI)的通用基础图像分割模型不能很好地分割蛋白质颗粒,因为其训练数据不包括冷冻电镜图像。在这里,我们提出了一种新方法(CryoSegNet),它将专门为冷冻电镜颗粒挑选设计和训练的注意力门控U形网络(U-Net)与SAM相结合。U-Net首先在一个大型冷冻电镜图像数据集上进行训练,然后用于从原始冷冻电镜图像生成输入,供SAM进行颗粒挑选。CryoSegNet在从冷冻电镜显微照片中分割蛋白质颗粒时显示出高精度和召回率,无论蛋白质的类型、形状和大小如何。在几个不同蛋白质类型的独立数据集上,CryoSegNet优于两种顶级机器学习颗粒挑选器crYOLO和Topaz以及SAM本身。从CryoSegNet挑选的颗粒重建的密度图的平均分辨率为3.32 Å,比Topaz的3.57 Å好7%,比crYOLO的3.85 Å好14%。