Shah Pranav N M, Sanchez-Garcia Ruben, Stuart David I
Division of Structural Biology, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, United Kingdom.
School of Science and Technology, IE University, Paseo de la Castellana 259, 28046 Madrid, Spain.
Acta Crystallogr D Struct Biol. 2025 Feb 1;81(Pt 2):63-76. doi: 10.1107/S2059798325000865.
Cryo-electron tomography is a rapidly developing field for studying macromolecular complexes in their native environments and has the potential to revolutionize our understanding of protein function. However, fast and accurate identification of particles in cryo-tomograms is challenging and represents a significant bottleneck in downstream processes such as subtomogram averaging. Here, we present tomoCPT (Tomogram Centroid Prediction Tool), a transformer-based solution that reformulates particle detection as a centroid-prediction task using Gaussian labels. Our approach, which is built upon the SwinUNETR architecture, demonstrates superior performance compared with both conventional binary labelling strategies and template matching. We show that tomoCPT effectively generalizes to novel particle types through zero-shot inference and can be significantly enhanced through fine-tuning with limited data. The efficacy of tomoCPT is validated using three case studies: apoferritin, achieving a resolution of 3.0 Å compared with 3.3 Å using template matching, SARS-CoV-2 spike proteins on cell surfaces, yielding an 18.3 Å resolution map where template matching proved unsuccessful, and rubisco molecules within carboxysomes, reaching 8.0 Å resolution. These results demonstrate the ability of tomoCPT to handle varied scenarios, including densely packed environments and membrane-bound proteins. The implementation of the tool as a command-line program, coupled with its minimal data requirements for fine-tuning, makes it a practical solution for high-throughput cryo-ET data-processing workflows.
冷冻电子断层扫描是一个快速发展的领域,用于研究天然环境中的大分子复合物,并且有可能彻底改变我们对蛋白质功能的理解。然而,在冷冻断层图中快速准确地识别颗粒具有挑战性,并且是亚断层图平均等下游过程中的一个重大瓶颈。在这里,我们展示了tomoCPT(断层图质心预测工具),这是一种基于Transformer的解决方案,它使用高斯标签将颗粒检测重新表述为质心预测任务。我们基于SwinUNETR架构构建的方法与传统的二元标记策略和模板匹配相比,表现出卓越的性能。我们表明,tomoCPT通过零样本推理有效地推广到新型颗粒类型,并且可以通过使用有限数据进行微调而得到显著增强。通过三个案例研究验证了tomoCPT的有效性:脱铁铁蛋白,与使用模板匹配时的3.3 Å相比,实现了3.0 Å的分辨率;细胞表面的SARS-CoV-2刺突蛋白,生成了一个18.3 Å分辨率的图谱,而模板匹配未成功;羧基体中的核酮糖-1,5-二磷酸羧化酶分子,达到了8.0 Å的分辨率。这些结果证明了tomoCPT处理各种场景的能力,包括密集堆积的环境和膜结合蛋白。该工具作为命令行程序的实现,加上其对微调的最低数据要求,使其成为高通量冷冻电子断层扫描数据处理工作流程的实用解决方案。