School of Information Engineering, University of Science and Technology Beijing, China.
Comput Biol Med. 2011 Oct;41(10):946-59. doi: 10.1016/j.compbiomed.2011.08.005. Epub 2011 Aug 30.
Methods for predicting protein secondary structures provide information that is useful both in ab initio structure prediction and as additional restraints for fold recognition algorithms. Secondary structure predictions may also be used to guide the design of site directed mutagenesis studies, and to locate potential functionally important residues. In this article, we propose a multi-modal back propagation neural network (MMBP) method for predicting protein secondary structures. Using a Knowledge Discovery Theory based on Inner Cognitive Mechanism (KDTICM) method, we have constructed a compound pyramid model (CPM), which is composed of three layers of intelligent interface that integrate multi-modal back propagation neural network (MMBP), mixed-modal SVM (MMS), modified Knowledge Discovery in Databases (KDD(⁎)) process and so on. The CPM method is both an integrated web server and a standalone application that exploits recent advancements in knowledge discovery and machine learning to perform very accurate protein secondary structure predictions. Using a non-redundant test dataset of 256 proteins from RCASP256, the CPM method achieves an average Q(3) score of 86.13% (SOV99=84.66%). Extensive testing indicates that this is significantly better than any other method currently available. Assessments using RS126 and CB513 datasets indicate that the CPM method can achieve average Q(3) score approaching 83.99% (SOV99=80.25%) and 85.58% (SOV99=81.15%). By using both sequence and structure databases and by exploiting the latest techniques in machine learning it is possible to routinely predict protein secondary structure with an accuracy well above 80%. A program and web server, called CPM, which performs these secondary structure predictions, is accessible at http://kdd.ustb.edu.cn/protein_Web/.
方法预测蛋白质二级结构提供的信息,这是有用的,无论是在从头预测结构和作为额外的限制,折叠识别算法。二级结构预测也可用于指导设计的定点突变研究,并找到潜在的功能重要残基。在本文中,我们提出了一种多模态反向传播神经网络(MMBP)的方法来预测蛋白质二级结构。使用基于知识发现理论的内部认知机制(KDTICM)的方法,我们构建了一个复合金字塔模型(CPM),它由三个层的智能接口组成,集成了多模态反向传播神经网络(MMBP)、混合模态 SVM(MMS)、改进的数据库中的知识发现(KDD(⁎))过程等。CPM 方法既是一个集成的网络服务器,也是一个独立的应用程序,它利用了知识发现和机器学习的最新进展,进行非常准确的蛋白质二级结构预测。使用来自 RCASP256 的 256 个非冗余测试数据集,CPM 方法的平均 Q(3)得分为 86.13%(SOV99=84.66%)。广泛的测试表明,这明显优于目前任何其他方法。使用 RS126 和 CB513 数据集进行评估表明,CPM 方法可以实现接近 83.99%(SOV99=80.25%)和 85.58%(SOV99=81.15%)的平均 Q(3)得分。通过使用序列和结构数据库,并利用机器学习的最新技术,有可能以高于 80%的精度常规地预测蛋白质二级结构。一个名为 CPM 的程序和网络服务器,用于执行这些二级结构预测,可以在 http://kdd.ustb.edu.cn/protein_Web/ 访问。