School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China.
Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China.
Nat Commun. 2024 Oct 11;15(1):8808. doi: 10.1038/s41467-024-53116-5.
Cryo-electron microscopy (cryo-EM) technique is widely used for protein structure determination. Current automatic cryo-EM protein complex modeling methods mostly rely on prior chain separation. However, chain separation without sequence guidance often suffers from errors caused by cross-chain interaction or noise densities, which would accumulate and mislead the subsequent steps. Here, we present EModelX, a fully automated cryo-EM protein complex structure modeling method, which achieves sequence-guiding modeling through cross-modal alignments between cryo-EM maps and protein sequences. EModelX first employs multi-task deep learning to predict Cα atoms, backbone atoms, and amino acid types from cryo-EM maps, which is subsequently used to sample Cα traces with amino acid profiles. The profiles are then aligned with protein sequences to obtain initial structural models, which yielded an average RMSD of 1.17 Å in our test set, approaching atomic-level precision in recovering PDB-deposited structures. After filling unmodeled gaps through sequence-guiding Cα threading, the final models achieved an average TM-score of 0.808, outperforming the state-of-the-art method. The further combination with AlphaFold can improve the average TM-score to 0.911. Analyzes conducted by comparing some EModelX-built models and PDB structures highlight its potential to improve PDB structures. EModelX is accessible at https://bio-web1.nscc-gz.cn/app/EModelX .
冷冻电镜(cryo-EM)技术广泛应用于蛋白质结构测定。当前的自动冷冻电镜蛋白质复合物建模方法大多依赖于预先的链分离。然而,没有序列指导的链分离常常受到跨链相互作用或噪声密度引起的错误的影响,这些错误会累积并误导后续步骤。在这里,我们提出了 EModelX,一种全自动的冷冻电镜蛋白质复合物结构建模方法,它通过冷冻电镜图谱和蛋白质序列之间的跨模态对齐来实现序列引导建模。EModelX 首先利用多任务深度学习从冷冻电镜图谱中预测 Cα 原子、骨架原子和氨基酸类型,然后用氨基酸图谱采样 Cα 轨迹。然后将这些图谱与蛋白质序列对齐,得到初始结构模型,在我们的测试集中,平均 RMSD 为 1.17 Å,接近恢复 PDB 结构的原子级精度。通过序列引导的 Cα 穿线填充未建模的间隙后,最终模型的平均 TM 分数达到 0.808,优于最先进的方法。与 AlphaFold 的进一步结合可以将平均 TM 分数提高到 0.911。通过比较一些 EModelX 构建的模型和 PDB 结构的分析,突出了其改善 PDB 结构的潜力。EModelX 可在 https://bio-web1.nscc-gz.cn/app/EModelX 访问。