Zhang Chi, Cheng Yiran, Feng Kaiwen, Zhang Fa, Han Renmin, Feng Jieqing
State Key Laboratory of CAD&CG, Zhejiang University, Hangzhou 310058, Zhejiang, China.
Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao 266000, Shandong, China.
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae636.
Automatic single particle picking is a critical step in the data processing pipeline of cryo-electron microscopy structure reconstruction. In recent years, several deep learning-based algorithms have been developed, demonstrating their potential to solve this challenge. However, current methods highly depend on manually labeled training data, which is labor-intensive and prone to biases especially for high-noise and low-contrast micrographs, resulting in suboptimal precision and recall. To address these problems, we propose UPicker, a semi-supervised transformer-based particle-picking method with a two-stage training process: unsupervised pretraining and supervised fine-tuning. During the unsupervised pretraining, an Adaptive Laplacian of Gaussian region proposal generator is proposed to obtain pseudo-labels from unlabeled data for initial feature learning. For the supervised fine-tuning, UPicker only needs a small amount of labeled data to achieve high accuracy in particle picking. To further enhance model performance, UPicker employs a contrastive denoising training strategy to reduce redundant detections and accelerate convergence, along with a hybrid data augmentation strategy to deal with limited labeled data. Comprehensive experiments on both simulated and experimental datasets demonstrate that UPicker outperforms state-of-the-art particle-picking methods in terms of accuracy and robustness while requiring fewer labeled data than other transformer-based models. Furthermore, ablation studies demonstrate the effectiveness and necessity of each component of UPicker. The source code and data are available at https://github.com/JachyLikeCoding/UPicker.
自动单颗粒挑选是冷冻电子显微镜结构重建数据处理流程中的关键步骤。近年来,已经开发了几种基于深度学习的算法,显示出它们解决这一挑战的潜力。然而,当前方法高度依赖人工标注的训练数据,这既耗费人力,又容易产生偏差,特别是对于高噪声和低对比度的显微图像,导致精度和召回率不理想。为了解决这些问题,我们提出了UPicker,一种基于半监督变压器的颗粒挑选方法,具有两阶段训练过程:无监督预训练和监督微调。在无监督预训练期间,提出了一种自适应高斯拉普拉斯区域提议生成器,从未标注数据中获取伪标签用于初始特征学习。对于监督微调,UPicker只需要少量标注数据就能在颗粒挑选中实现高精度。为了进一步提高模型性能,UPicker采用对比去噪训练策略来减少冗余检测并加速收敛,同时采用混合数据增强策略来处理有限的标注数据。在模拟和实验数据集上的综合实验表明,UPicker在准确性和鲁棒性方面优于现有最先进的颗粒挑选方法,同时比其他基于变压器的模型需要更少的标注数据。此外,消融研究证明了UPicker各组件的有效性和必要性。源代码和数据可在https://github.com/JachyLikeCoding/UPicker获取。