Division of Computing and Software Systems, University of Washington Bothell, Bothell, Washington.
Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, Charlottesville, Virginia.
Biophys J. 2022 Aug 2;121(15):2840-2848. doi: 10.1016/j.bpj.2022.06.025. Epub 2022 Jun 28.
The recent revolution in cryo-electron microscopy (cryo-EM) has made it possible to determine macromolecular structures directly from cell extracts. However, identifying the correct protein from the cryo-EM map is still challenging and often needs additional sequence information from other techniques, such as tandem mass spectrometry and/or bioinformatics. Here, we present DeepTracer-ID, a server-based approach to identify the candidate protein in a user-provided organism de novo from a cryo-EM map, without the need for additional information. Our method first uses DeepTracer to generate a protein backbone model that best represents the cryo-EM map, and this model is then searched against the library of AlphaFold2 predictions for all proteins in the given organism. This method is highly accurate and robust for high-resolution cryo-EM maps: in all 13 experimental maps tested blindly, DeepTracer-ID identified the correct proteins as the top candidates. Eight of the maps were of known structures, while the other five unpublished maps were validated by prior protein annotation and careful inspection of the model refined into the map. The program also showed promising results for both homomeric and heteromeric protein complexes. This platform is possible because of the recent breakthroughs in large-scale three-dimensional protein structure prediction.
最近冷冻电子显微镜(cryo-EM)的革命使得直接从细胞提取物中确定大分子结构成为可能。然而,从 cryo-EM 图谱中识别正确的蛋白质仍然具有挑战性,并且通常需要来自其他技术(如串联质谱和/或生物信息学)的附加序列信息。在这里,我们提出了 DeepTracer-ID,这是一种基于服务器的方法,可以从 cryo-EM 图谱中从头开始为用户提供的生物体识别候选蛋白质,而无需其他信息。我们的方法首先使用 DeepTracer 生成一个最佳代表 cryo-EM 图谱的蛋白质骨架模型,然后将该模型与给定生物体中所有蛋白质的 AlphaFold2 预测库进行搜索。这种方法对于高分辨率 cryo-EM 图谱非常准确和稳健:在所有 13 个盲测的实验图谱中,DeepTracer-ID 将正确的蛋白质识别为最佳候选物。其中 8 个图谱具有已知结构,而另外 5 个未公布的图谱则通过先前的蛋白质注释和对模型细化到图谱中的仔细检查进行了验证。该程序在同源和异源蛋白质复合物方面也显示出了有希望的结果。这个平台之所以成为可能,是因为最近在大规模三维蛋白质结构预测方面取得了突破。