Li Hongjia, Chen Ge, Gao Shan, Li Jintao, Wan Xiaohua, Zhang Fa
High Performance Computer Research Center, Institute of Computing Technology, Beijing, China.
University of Chinese Academy of Sciences, Beijing, China.
J Comput Biol. 2022 Oct;29(10):1117-1131. doi: 10.1089/cmb.2022.0101. Epub 2022 Aug 18.
The cryo-electron microscopy (cryo-EM) single-particle analysis requires tens of thousands of particle projections to reveal structural information of macromolecular complexes. However, due to the low signal-to-noise ratio and the presence of high contrast artifacts and contaminants in the micrographs, the semiautomatic and fully automatic particle picking algorithms tend to suffer from high false-positive rates, which degrades the confidence of structure determination. In this study, we introduce PickerOptimizer (PO), a transfer learning-based classification neural network for particle pruning in cryo-EM, as an additional strategy to complement the current automated particle picking algorithms. To achieve high classification performance with minimal human intervention, we adopted two key strategies: (1) utilizing the transfer learning techniques to train the convolutional neural network, where the knowledge gained from public classification datasets is applied to the field of cryo-EM. (2) Designing a multiloss strategy, a combination of multiple loss functions, to guide the optimization of the network parameters. To reduce the domain shift between cryo-EM images and natural images for pretraining, we build the first image classification dataset for cryo-EM, which contains positive and negative samples collected from EMPIAR entries. The PO is tested on 14 public experimental datasets, achieving accuracy and F1 scores above 95% in most cases. Furthermore, three case studies are provided to verify the model performance by applying PO on problematic particle selections, showing that our algorithm achieved better or comparable performance compared with other particle pruning strategies.
冷冻电子显微镜(cryo-EM)单颗粒分析需要数万个颗粒投影来揭示大分子复合物的结构信息。然而,由于显微照片中信噪比低,以及存在高对比度伪影和污染物,半自动和全自动颗粒挑选算法往往具有较高的误报率,这降低了结构确定的可信度。在本研究中,我们引入了PickerOptimizer(PO),这是一种基于迁移学习的用于cryo-EM中颗粒修剪的分类神经网络,作为补充当前自动颗粒挑选算法的额外策略。为了在最少人工干预的情况下实现高分类性能,我们采用了两个关键策略:(1)利用迁移学习技术训练卷积神经网络,即将从公共分类数据集中获得的知识应用于cryo-EM领域。(2)设计一种多损失策略,即多种损失函数的组合,以指导网络参数的优化。为了减少用于预训练的cryo-EM图像与自然图像之间的领域差异,我们构建了第一个用于cryo-EM的图像分类数据集,其中包含从EMPIAR条目中收集的正样本和负样本。PO在14个公共实验数据集上进行了测试,在大多数情况下准确率和F1分数均高于95%。此外,还提供了三个案例研究,通过将PO应用于有问题的颗粒选择来验证模型性能,结果表明我们的算法与其他颗粒修剪策略相比取得了更好或相当的性能。