分类蛋白质折叠预测

Triage protein fold prediction.

作者信息

He Hongxian, McAllister Gregory, Smith Temple F

机构信息

BioMolecular Engineering Research Center, Biomedical Engineering Department, Boston University, Boston, Massachusetts 02215, USA.

出版信息

Proteins. 2002 Sep 1;48(4):654-63. doi: 10.1002/prot.10194.

DOI:10.1002/prot.10194

PMID:12211033

Abstract

We have constructed, in a completely automated fashion, a new structure template library for threading that represents 358 distinct SCOP folds where each model is mathematically represented as a Hidden Markov model (HMM). Because the large number of models in the library can potentially dilute the prediction measure, a new triage method for fold prediction is employed. In the first step of the triage method, the most probable structural class is predicted using a set of manually constructed, high-level, generalized structural HMMs that represent seven general protein structural classes: all-alpha, all-beta, alpha/beta, alpha+beta, irregular small metal-binding, transmembrane beta-barrel, and transmembrane alpha-helical. In the second step, only those fold models belonging to the determined structural class are selected for the final fold prediction. This triage method gave more predictions as well as more correct predictions compared with a simple prediction method that lacks the initial classification step. Two different schemes of assigning Bayesian model priors are presented and discussed.

摘要

我们以完全自动化的方式构建了一个用于穿线法的新结构模板库，该库代表358种不同的SCOP折叠，其中每个模型在数学上都表示为隐马尔可夫模型（HMM）。由于库中大量的模型可能会稀释预测指标，因此采用了一种新的折叠预测分类方法。在分类方法的第一步中，使用一组手动构建的、高级的、广义结构HMM预测最可能的结构类别，这些HMM代表七种一般蛋白质结构类别：全α、全β、α/β、α+β、不规则小金属结合、跨膜β桶和跨膜α螺旋。在第二步中，仅选择那些属于确定结构类别的折叠模型进行最终的折叠预测。与缺乏初始分类步骤的简单预测方法相比，这种分类方法给出了更多的预测以及更多正确的预测。本文提出并讨论了两种不同的分配贝叶斯模型先验的方案。