一种用于全局最优穿线的高效计算方法。

An efficient computational method for globally optimal threading.

作者信息

Xu Y, Xu D, Uberbacher E C

机构信息

Computational Biosciences Section, Oak Ridge National Laboratory, Tennessee 37831-6480, USA.

出版信息

J Comput Biol. 1998 Fall;5(3):597-614. doi: 10.1089/cmb.1998.5.597.

DOI:10.1089/cmb.1998.5.597

PMID:9773353

Abstract

Computational recognition of native-like folds of an anonymous amino acid sequence from a protein fold database is considered to be a promising approach to the three-dimensional (3D) fold prediction of the amino acid sequence. We present a new method for protein fold recognition through optimally aligning an amino acid sequence and a protein fold template (protein threading). The fitness of aligning an amino acid sequence with a fold template is measured by (1) the singleton fitness, representing the compatibility of substituting one amino acid by another and the combined preference of secondary structure and solvent accessibility for a particular amino acid, (2) the pairwise interaction, representing the contact preference between a pair of amino acids, and (3) alignment gap penalties. Though a protein threading problem so defined is known to be NP-hard in the most general sense, our algorithm runs efficiently if we place a cutoff distance on the pairwise interactions, as many of the existing threading programs do. For an amino acid sequence of size n and a fold template of size m with M core secondary structures, the algorithm finds an optimal alignment in O (Mn1.5C + 1 + mnC + 1) time and O (MnC + 1) space, where C is a (small) nonnegative integer, determined by a particular mathematical property of the pairwise interactions. As a case study, we have demonstrated that C is less than or equal to 4 for about 75% of the 293 unique folds in our protein database, when pairwise interactions are restricted to amino acids < or = 7 A apart (measured between their beta carbon atoms). An approximation scheme is developed for fold templates with C > 4, when threading requires too much memory and time to be practical on a typical workstation.

摘要

从蛋白质折叠数据库中对匿名氨基酸序列的天然样折叠进行计算识别，被认为是预测氨基酸序列三维（3D）折叠的一种有前途的方法。我们提出了一种通过将氨基酸序列与蛋白质折叠模板进行最优比对（蛋白质穿线法）来进行蛋白质折叠识别的新方法。将氨基酸序列与折叠模板进行比对的适合度通过以下方式衡量：（1）单氨基酸适合度，代表一种氨基酸被另一种氨基酸替代的兼容性以及特定氨基酸的二级结构和溶剂可及性的综合偏好；（2）成对相互作用，代表一对氨基酸之间的接触偏好；（3）比对空位罚分。尽管如此定义的蛋白质穿线问题在最一般意义上已知是NP难问题，但如果我们像许多现有穿线程序那样对成对相互作用设置一个截止距离，我们的算法就能高效运行。对于大小为n的氨基酸序列和大小为m且具有M个核心二级结构的折叠模板，该算法在O（Mn1.5C + 1 + mnC + 1）时间和O（MnC + 1）空间内找到最优比对，其中C是一个（小的）非负整数，由成对相互作用的特定数学性质决定。作为一个案例研究，我们已经证明，当成对相互作用限于距离小于或等于7埃（在它们的β碳原子之间测量）的氨基酸时，对于我们蛋白质数据库中293个独特折叠中的约75%，C小于或等于4。当穿线在典型工作站上需要太多内存和时间而不切实际时，针对C > 4的折叠模板开发了一种近似方案。