Uddin Mostofa Rafid, Ahmed Ajmain Yasar, Tahmid Toki, Alam Zarif Ul, Freyberg Zachary, Xu Min
Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, USA.
bioRxiv. 2024 Nov 6:2024.11.04.620735. doi: 10.1101/2024.11.04.620735.
Particle picking in cryo-electron tomograms (cryo-ET) is crucial for in situ structure detection of macromolecules and protein complexes. The traditional template-matching-based approaches for particle picking suffer from template-specific biases and have low throughput. Given these problems, learning-based solutions are necessary for particle picking. However, the paucity of annotated data for training poses substantial challenges for such learning-based approaches. Moreover, preparing extensively annotated cryo-ET tomograms for particle picking is extremely time-consuming and burdensome. Addressing these challenges, we present TomoPicker, an annotation-efficient particle-picking approach that can effectively pick particles when only a minuscule portion (~ 0.3 - 0.5%) of the total particles in a cellular cryo-ET dataset is provided for training. TomoPicker regards particle picking as a voxel classification problem and solves it with two different positive-unlabeled learning approaches. We evaluated our method on a benchmark cryo-ET dataset of eukaryotic cells, where we observed about 30% improvement by TomoPicker against the most recent state-of-the-art annotation efficient learning-based picking approaches.
在冷冻电子断层扫描(cryo-ET)中挑选粒子对于大分子和蛋白质复合物的原位结构检测至关重要。传统的基于模板匹配的粒子挑选方法存在模板特异性偏差且通量较低。鉴于这些问题,基于学习的粒子挑选解决方案是必要的。然而,用于训练的标注数据匮乏给此类基于学习的方法带来了巨大挑战。此外,为粒子挑选准备大量标注的冷冻电子断层扫描图极其耗时且繁重。为应对这些挑战,我们提出了TomoPicker,这是一种标注高效的粒子挑选方法,当仅提供细胞冷冻电子断层扫描数据集中总粒子的极小部分(约0.3 - 0.5%)用于训练时,它就能有效地挑选粒子。TomoPicker将粒子挑选视为体素分类问题,并通过两种不同的正样本未标注学习方法来解决它。我们在一个真核细胞的基准冷冻电子断层扫描数据集上评估了我们的方法,在该数据集中,我们观察到TomoPicker相对于最新的基于标注高效学习的挑选方法有大约30%的提升。