Sun Weitao, He Jing
Department of Computer Science, New Mexico State University, Las Cruces, 88003, USA.
BMC Bioinformatics. 2009 Jan 30;10 Suppl 1(Suppl 1):S40. doi: 10.1186/1471-2105-10-S1-S40.
Electron cryomicroscopy is a fast developing technique aiming at the determination of the 3-dimensional structures of large protein complexes. Using this technique, protein density maps can be generated with 6 to 10 A resolution. At such resolutions, the secondary structure elements such as helices and beta-strands appear to be skeletons and can be computationally detected. However, it is not known which segment of the protein sequence corresponds to which of the skeletons. The topology in this paper refers to the linear order and the directionality of the secondary structures. For a protein with N helices and M strands, there are (N!2N)(M!2M) different topologies, each of which maps N helix segments and M strand segments on the protein sequence to N helix and M strand skeletons. Since the backbone position is not available in the skeleton, each topology of the skeletons corresponds to additional freedom to position the atoms in the skeletons.
We have developed a method to construct the possible atomic structures for the helix skeletons by sampling the solution space of all the possible topologies of the skeletons. Our method also ranks the possible structures based on the contact energy formed by the secondary structures, rather than the entire chain. If we assume that the backbone atomic positions are known for the skeletons, then the native topology of the secondary structures can be found in the top 30% of the ranked list of all possible topologies for all the 30 proteins tested, and within the top 5% for most of the 30 proteins. Without assuming the backbone location of the skeletons, the possible atomic structures of the skeletons can be constructed using the axis of the skeleton and the sequence segments. The best constructed structure for the skeletons has RMSD to native between 4 and 5 A for the four tested alpha-proteins. These best constructed structures were ranked the 17th, 31st, 16th and 5th respectively for the four proteins out of 32066, 391833, 98755 and 192935 possible assignments in the pool.
Our work suggested that the direct estimation of the contact energy formed by the secondary structures is quite effective in reducing the topological space to a small subset that includes a near native structure for the skeletons.
电子冷冻显微镜技术是一种快速发展的技术,旨在测定大型蛋白质复合物的三维结构。利用该技术,可以生成分辨率为6至10埃的蛋白质密度图。在这样的分辨率下,诸如螺旋和β链等二级结构元件看起来像骨架,可以通过计算检测到。然而,尚不清楚蛋白质序列的哪一段对应于哪个骨架。本文中的拓扑结构是指二级结构的线性顺序和方向性。对于一个具有N个螺旋和M条链的蛋白质,存在(N!2N)(M!2M)种不同的拓扑结构,其中每一种都将蛋白质序列上的N个螺旋段和M个链段映射到N个螺旋和M个链骨架上。由于骨架中没有主链位置,骨架的每种拓扑结构都对应着在骨架中定位原子的额外自由度。
我们开发了一种方法,通过对骨架所有可能拓扑结构的解空间进行采样,来构建螺旋骨架的可能原子结构。我们的方法还根据二级结构而非整个链形成的接触能量对可能的结构进行排序。如果我们假设骨架的主链原子位置已知,那么在所有测试的30种蛋白质中,二级结构的天然拓扑结构可以在所有可能拓扑结构的排名列表的前30%中找到,并且在大多数30种蛋白质中位于前5%以内。在不假设骨架主链位置的情况下,可以使用骨架的轴和序列段来构建骨架的可能原子结构。对于四种测试的α蛋白,构建的骨架最佳结构与天然结构的均方根偏差在4至5埃之间。在32066、391833、98755和192935种可能的分配组合中,这四种蛋白质构建的最佳结构分别排名第17、31、16和第5。
我们的工作表明,直接估计二级结构形成的接触能量在将拓扑空间缩小到一个包含骨架近天然结构的小子集方面非常有效。