Zhou Xiaogen, Li Yang, Zhang Chengxin, Zheng Wei, Zhang Guijun, Zhang Yang
Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
College of Information Engineering, Zhejiang University of Technology, Hangzhou, China.
Nat Comput Sci. 2022 Apr;2(4):265-275. doi: 10.1038/s43588-022-00232-1. Epub 2022 Apr 28.
Progress in cryo-electron microscopy has provided the potential for large-size protein structure determination. However, the success rate for solving multi-domain proteins remains low because of the difficulty in modelling inter-domain orientations. Here we developed domain enhanced modeling using cryo-electron microscopy (DEMO-EM), an automatic method to assemble multi-domain structures from cryo-electron microscopy maps through a progressive structural refinement procedure combining rigid-body domain fitting and flexible assembly simulations with deep-neural-network inter-domain distance profiles. The method was tested on a large-scale benchmark set of proteins containing up to 12 continuous and discontinuous domains with medium- to low-resolution density maps, where DEMO-EM produced models with correct inter-domain orientations (template modeling score (TM-score) >0.5) for 97% of cases and outperformed state-of-the-art methods. DEMO-EM was applied to the severe acute respiratory syndrome coronavirus 2 genome and generated models with average TM-score and root-mean-square deviation of 0.97 and 1.3 Å, respectively, with respect to the deposited structures. These results demonstrate an efficient pipeline that enables automated and reliable large-scale multi-domain protein structure modelling from cryo-electron microscopy maps.
冷冻电子显微镜技术的进步为确定大尺寸蛋白质结构提供了可能。然而,由于在模拟结构域间取向方面存在困难,解析多结构域蛋白质的成功率仍然较低。在此,我们开发了基于冷冻电子显微镜的结构域增强建模方法(DEMO-EM),这是一种通过逐步结构优化程序,从冷冻电子显微镜图谱中自动组装多结构域结构的方法,该程序将刚体结构域拟合、灵活组装模拟与深度神经网络结构域间距离轮廓相结合。该方法在一组大规模的蛋白质基准数据集上进行了测试,这些蛋白质包含多达12个连续和不连续的结构域,具有中低分辨率的密度图,其中DEMO-EM在97%的情况下生成了具有正确结构域间取向(模板建模得分(TM-score)>0.5)的模型,并且优于现有方法。DEMO-EM被应用于严重急性呼吸综合征冠状病毒2基因组,并生成了相对于已存入结构而言,平均TM得分和均方根偏差分别为0.97和1.3Å的模型。这些结果证明了一种高效的流程,能够从冷冻电子显微镜图谱中实现自动化且可靠的大规模多结构域蛋白质结构建模。