White J V, Muchnik I, Smith T F
TASC, Walkers Brook Drive, Reading, Massachusetts.
Math Biosci. 1994 Dec;124(2):149-79. doi: 10.1016/0025-5564(94)90041-8.
A mathematical formalism is introduced that has general applicability to many protein structure models used in the various approaches to the "inverse protein folding problem." The inverse nature of the problem arises from the fact that one begins with a set of assumed tertiary structures and searches for those most compatible with a new sequence, rather than attempting to predict the structure directly from the new sequence. The formalism is based on the well-known theory of Markov random fields (MRFs). Our MRF formulation provides explicit representations for the relevant amino acid position environments and the physical topologies of the structural contacts. In particular, MRF models can readily be constructed for the secondary structure packing topologies found in protein domain cores, or other structural motifs, that are anticipated to be common among large sets of both homologous and nonhomologous proteins. MRF models are probabilistic and can exploit the statistical data from the limited number of proteins having known domain structures. The MRF approach leads to a new scoring function for comparing different threadings (placements) of a sequence through different structure models. The scoring function is very important, because comparing alternative structure models with each other is a key step in the inverse folding problem. Unlike previously published scoring functions, the one derived in this paper is based on a comprehensive probabilistic formulation of the threading problem.
引入了一种数学形式体系,它普遍适用于用于解决“蛋白质折叠逆问题”的各种方法中所使用的许多蛋白质结构模型。该问题的逆性源于这样一个事实,即人们从一组假定的三级结构开始,寻找那些与新序列最兼容的结构,而不是试图直接从新序列预测结构。这种形式体系基于著名的马尔可夫随机场(MRF)理论。我们的MRF公式为相关氨基酸位置环境和结构接触的物理拓扑提供了明确的表示。特别是,可以很容易地为蛋白质结构域核心或其他结构基序中发现的二级结构堆积拓扑构建MRF模型,预计这些拓扑在大量同源和非同源蛋白质中是常见的。MRF模型是概率性的,可以利用来自有限数量已知结构域结构的蛋白质的统计数据。MRF方法导致了一种新的评分函数,用于比较序列通过不同结构模型的不同穿线(放置)。这个评分函数非常重要,因为相互比较替代结构模型是折叠逆问题中的关键一步。与之前发表的评分函数不同,本文推导的评分函数基于穿线问题的全面概率公式。