考虑序列氨基酸相互作用偏好的蛋白质穿线问题是NP完全问题。

The protein threading problem with sequence amino acid interaction preferences is NP-complete.

作者信息

Lathrop R H

机构信息

Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge 02139.

出版信息

Protein Eng. 1994 Sep;7(9):1059-68. doi: 10.1093/protein/7.9.1059.

DOI:10.1093/protein/7.9.1059

PMID:7831276

Abstract

In recent protein structure prediction research there has been a great deal of interest in using amino acid interaction preferences (e.g. contact potentials or potentials of mean force) to align ('thread') a protein sequence to a known structural motif. An important open question is whether a polynomial time algorithm for finding the globally optimal threading is possible. We identify the two critical conditions governing this question: (i) variable-length gaps are admitted into the alignment, and (ii) interactions between amino acids from the sequence are admitted into the score function. We prove that if both these conditions are allowed then the protein threading decision problem (does there exist a threading with a score < or = K?) is NP-complete (in the strong sense, i.e. is not merely a number problem) and the related problem of finding the globally optimal protein threading is NP-hard. Therefore, no polynomial time algorithm is possible (unless P = NP). This result augments existing proofs that the direct protein folding problem is NP-complete by providing the corresponding proof for the 'inverse' protein folding problem. It provides a theoretical basis for understanding algorithms currently in use and indicates that computational strategies from other NP-complete problems may be useful for predictive algorithms.

摘要

在最近的蛋白质结构预测研究中，人们对使用氨基酸相互作用偏好（例如接触势或平均力势）将蛋白质序列与已知结构基序进行比对（“穿线法”）产生了浓厚兴趣。一个重要的开放性问题是，是否有可能存在一种多项式时间算法来找到全局最优穿线法。我们确定了支配这个问题的两个关键条件：（i）比对中允许可变长度的空位，以及（ii）序列中氨基酸之间的相互作用被纳入评分函数。我们证明，如果允许这两个条件，那么蛋白质穿线决策问题（是否存在得分≤K的穿线法？）是NP完全问题（从严格意义上讲，即不仅仅是一个数值问题），并且找到全局最优蛋白质穿线法的相关问题是NP难问题。因此，不存在多项式时间算法（除非P = NP）。这一结果通过为“反向”蛋白质折叠问题提供相应证明，扩充了现有的关于直接蛋白质折叠问题是NP完全问题的证明。它为理解当前使用的算法提供了理论基础，并表明来自其他NP完全问题的计算策略可能对预测算法有用。