Krishnamoorthy Bala, Tropsha Alexander
Department of Operations Research, CB 3180, UNC Chapel Hill, NC 27599, USA.
Bioinformatics. 2003 Aug 12;19(12):1540-8. doi: 10.1093/bioinformatics/btg186.
Most scoring functions used in protein fold recognition employ two-body (pseudo) potential energies. The use of higher-order terms may improve the performance of current algorithms.
Proteins are represented by the side chain centroids of amino acids. Delaunay tessellation of this representation defines all sets of nearest neighbor quadruplets of amino acids. Four-body contact scoring function (log likelihoods of residue quadruplet compositions) is derived by the analysis of a diverse set of proteins with known structures. A test protein is characterized by the total score calculated as the sum of the individual log likelihoods of composing amino acid quadruplets.
The scoring function distinguishes native from partially unfolded or deliberately misfolded structures. It also discriminates between pre- and post-transition state and native structures in the folding simulations trajectory of Chymotrypsin Inhibitor 2 (CI2).
蛋白质折叠识别中使用的大多数评分函数采用两体(伪)势能。使用高阶项可能会提高当前算法的性能。
蛋白质由氨基酸的侧链质心表示。这种表示的德劳内三角剖分定义了所有氨基酸最近邻四重体的集合。通过对一组具有已知结构的不同蛋白质进行分析,得出四体接触评分函数(残基四重体组成的对数似然性)。测试蛋白质的特征在于通过组成氨基酸四重体的各个对数似然性之和计算出的总分。
该评分函数能够区分天然结构与部分未折叠或故意错误折叠的结构。它还能在胰凝乳蛋白酶抑制剂2(CI2)的折叠模拟轨迹中区分转变前、转变后状态以及天然结构。