Dong Qiwen, Wang Xiaolong, Lin Lei
School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.
Proteins. 2008 Jul;72(1):353-66. doi: 10.1002/prot.21931.
In recent years, protein structure prediction using local structure information has made great progress. In this study, a novel and effective method is developed to predict the local structure and the folding fragments of proteins. First, the proteins with known structures are split into fragments. Second, these fragments, represented by dihedrals, are clustered to produce the building blocks (BBs). Third, an efficient machine learning method is used to predict the local structures of proteins from sequence profiles. Finally, a bi-gram model, trained by an iterated algorithm, is introduced to simulate the interactions of these BBs. For test proteins, the building-block lattice is constructed, which contains all the folding fragments of the proteins. The local structures and the optimal fragments are then obtained by the dynamic programming algorithm. The experiment is performed on a subset of the PDB database with sequence identity less than 25%. The results show that the performance of the method is better than the method that uses only sequence information. When multiple paths are returned, the average classification accuracy of local structures is 72.27% and the average prediction accuracy of local structures is 67.72%, which is a significant improvement in comparison with previous studies. The method can predict not only the local structures but also the folding fragments of proteins. This work is helpful for the ab initio protein structure prediction and especially, the understanding of the folding process of proteins.
近年来,利用局部结构信息进行蛋白质结构预测取得了很大进展。在本研究中,开发了一种新颖且有效的方法来预测蛋白质的局部结构和折叠片段。首先,将具有已知结构的蛋白质拆分成片段。其次,以二面角表示的这些片段被聚类以产生构建模块(BBs)。第三,使用一种高效的机器学习方法从序列概况预测蛋白质的局部结构。最后,引入一种通过迭代算法训练的二元模型来模拟这些构建模块之间的相互作用。对于测试蛋白质,构建构建模块晶格,其中包含蛋白质的所有折叠片段。然后通过动态规划算法获得局部结构和最优片段。实验在PDB数据库中序列同一性小于25%的一个子集上进行。结果表明,该方法的性能优于仅使用序列信息的方法。当返回多条路径时,局部结构的平均分类准确率为72.27%,局部结构的平均预测准确率为67.72%,与先前的研究相比有显著提高。该方法不仅可以预测蛋白质的局部结构,还可以预测蛋白质的折叠片段。这项工作有助于从头进行蛋白质结构预测,尤其是有助于理解蛋白质的折叠过程。