Shanghai Key Lab of Intelligent Information Processing and the School of Computer Science, Fudan University, Old Yifu Building, Room 202-5, 220 Handan Road, Shanhai 200433, China.
IEEE/ACM Trans Comput Biol Bioinform. 2011 Mar-Apr;8(2):476-86. doi: 10.1109/TCBB.2010.86.
The prediction of 3D structures of proteins from amino acid sequences is one of the most challenging problems in molecular biology. An essential task for solving this problem with coarse-grained models is to deduce effective interaction potentials. The development and evaluation of new energy functions is critical to accurately modeling the properties of biological macromolecules. Knowledge-based mean force potentials are derived from statistical analysis of proteins of known structures. Current knowledge-based potentials are almost in the form of weighted linear sum of interaction pairs. In this study, a class of novel nonlinear knowledge-based mean force potentials is presented. The potential parameters are obtained by nonlinear classifiers, instead of relative frequencies of interaction pairs against a reference state or linear classifiers. The support vector machine is used to derive the potential parameters on data sets that contain both native structures and decoy structures. Five knowledge-based mean force Boltzmann-based or linear potentials are introduced and their corresponding nonlinear potentials are implemented. They are the DIH potential (single-body residue-level Boltzmann-based potential), the DFIRE-SCM potential (two-body residue-level Boltzmann-based potential), the FS potential (two-body atom-level Boltzmann-based potential), the HR potential (two-body residue-level linear potential), and the T32S3 potential (two-body atom-level linear potential). Experiments are performed on well-established decoy sets, including the LKF data set, the CASP7 data set, and the Decoys “R”Us data set. The evaluation metrics include the energy Z score and the ability of each potential to discriminate native structures from a set of decoy structures. Experimental results show that all nonlinear potentials significantly outperform the corresponding Boltzmann-based or linear potentials, and the proposed discriminative framework is effective in developing knowledge-based mean force potentials. The nonlinear potentials can be widely used for ab initio protein structure prediction, model quality assessment, protein docking, and other challenging problems in computational biology.
从氨基酸序列预测蛋白质的 3D 结构是分子生物学中最具挑战性的问题之一。使用粗粒度模型解决此问题的一个基本任务是推导出有效的相互作用势。开发和评估新的能量函数对于准确建模生物大分子的性质至关重要。基于知识的平均力势是从已知结构的蛋白质的统计分析中得出的。目前基于知识的势几乎都是相互作用对相对于参考状态的加权线性和的形式。在这项研究中,提出了一类新的非线性基于知识的平均力势。势参数是通过非线性分类器而不是相互作用对的相对频率或线性分类器获得的。支持向量机用于从包含天然结构和诱饵结构的数据集中推导出势参数。介绍了五个基于知识的平均力 Boltzmann 或线性势,并实现了它们对应的非线性势。它们是 DIH 势(单体重组水平 Boltzmann 势)、DFIRE-SCM 势(双体重组水平 Boltzmann 势)、FS 势(双体重组原子水平 Boltzmann 势)、HR 势(双体重组水平线性势)和 T32S3 势(双体重组原子水平线性势)。在包括 LKF 数据集、CASP7 数据集和 Decoys “R”Us 数据集在内的成熟的诱饵集上进行了实验。评估指标包括能量 Z 得分和每种势区分天然结构和一组诱饵结构的能力。实验结果表明,所有非线性势都明显优于相应的 Boltzmann 势或线性势,并且所提出的判别框架在开发基于知识的平均力势方面是有效的。非线性势可广泛用于从头蛋白质结构预测、模型质量评估、蛋白质对接和计算生物学中的其他挑战性问题。