Department of Biochemistry and Molecular Biology, University of Texas Medical Branch, 301 University Boulevard, Galveston, TX 7555-0647, USA.
Bioinformatics. 2012 Dec 15;28(24):3265-73. doi: 10.1093/bioinformatics/bts616. Epub 2012 Nov 6.
Owing to the size and complexity of large multi-component biological assemblies, the most tractable approach to determining their atomic structure is often to fit high-resolution radiographic or nuclear magnetic resonance structures of isolated components into lower resolution electron density maps of the larger assembly obtained using cryo-electron microscopy (cryo-EM). This hybrid approach to structure determination requires that an atomic resolution structure of each component, or a suitable homolog, is available. If neither is available, then the amount of structural information regarding that component is limited by the resolution of the cryo-EM map. However, even if a suitable homolog cannot be identified using sequence analysis, a search for structural homologs should still be performed because structural homology often persists throughout evolution even when sequence homology is undetectable, As macromolecules can often be described as a collection of independently folded domains, one way of searching for structural homologs would be to systematically fit representative domain structures from a protein domain database into the medium/low resolution cryo-EM map and return the best fits. Taken together, the best fitting non-overlapping structures would constitute a 'mosaic' backbone model of the assembly that could aid map interpretation and illuminate biological function.
Using the computational principles of the Scale-Invariant Feature Transform (SIFT), we have developed FOLD-EM-a computational tool that can identify folded macromolecular domains in medium to low resolution (4-15 Å) electron density maps and return a model of the constituent polypeptides in a fully automated fashion. As a by-product, FOLD-EM can also do flexible multi-domain fitting that may provide insight into conformational changes that occur in macromolecular assemblies.
由于大型多组分生物组装体的规模和复杂性,确定其原子结构最可行的方法通常是将孤立组件的高分辨率射线照相或核磁共振结构拟合到使用低温电子显微镜(cryo-EM)获得的较大组装体的较低分辨率电子密度图中。这种混合结构确定方法要求每个组件的原子分辨率结构或合适的同源物可用。如果两者都不可用,则关于该组件的结构信息量将受到 cryo-EM 图的分辨率的限制。但是,即使不能使用序列分析来识别合适的同源物,也应该进行结构同源物搜索,因为即使序列同源性不可检测,结构同源性通常也会在整个进化过程中保持下去。由于大分子通常可以描述为独立折叠结构域的集合,因此寻找结构同源物的一种方法是将来自蛋白质结构域数据库的代表性结构域结构系统地拟合到中/低分辨率 cryo-EM 图中,并返回最佳拟合。总体而言,最佳拟合的非重叠结构将构成组装体的“马赛克”骨干模型,该模型有助于解释图谱并阐明生物学功能。
我们使用 Scale-Invariant Feature Transform(SIFT)的计算原理开发了 FOLD-EM-一种能够在中等到低分辨率(4-15Å)电子密度图中识别折叠大分子结构域并以全自动方式返回组成多肽模型的计算工具。作为副产品,FOLD-EM 还可以进行灵活的多结构域拟合,这可能有助于了解大分子组装体中发生的构象变化。