DPANN：折叠识别后改进的序列到结构比对。

DPANN: improved sequence to structure alignments following fold recognition.

作者信息

Reinhardt Astrid, Eisenberg David

机构信息

Faint Signals Pattern Recognition, Los Angeles, California, USA.

出版信息

Proteins. 2004 Aug 15;56(3):528-38. doi: 10.1002/prot.20144.

DOI:10.1002/prot.20144

PMID:15229885

Abstract

In fold recognition (FR) a protein sequence of unknown structure is assigned to the closest known three-dimensional (3D) fold. Although FR programs can often identify among all possible folds the one a sequence adopts, they frequently fail to align the sequence to the equivalent residue positions in that fold. Such failures frustrate the next step in structure prediction, protein model building. Hence it is desirable to improve the quality of the alignments between the sequence and the identified structure. We have used artificial neural networks (ANN) to derive a substitution matrix to create alignments between a protein sequence and a protein structure through dynamic programming (DPANN: Dynamic Programming meets Artificial Neural Networks). The matrix is based on the amino acid type and the secondary structure state of each residue. In a database of protein pairs that have the same fold but lack sequences-similarity, DPANN aligns over 30% of all sequences to the paired structure, resembling closely the structural superposition of the pair. In over half of these cases the DPANN alignment is close to the structural superposition, although the initial alignment from the step of fold recognition is not close. Conversely, the alignment created during fold recognition outperforms DPANN in only 10% of all cases. Thus application of DPANN after fold recognition leads to substantial improvements in alignment accuracy, which in turn provides more useful templates for the modeling of protein structures. In the artificial case of using actual instead of predicted secondary structures for the probe protein, over 50% of the alignments are successful.

摘要

在折叠识别（FR）中，未知结构的蛋白质序列被指定为与其最接近的已知三维（3D）折叠。尽管FR程序通常能够在所有可能的折叠中识别出序列所采用的折叠，但它们常常无法将序列与该折叠中对应的残基位置进行比对。这种失败使得结构预测的下一步——蛋白质模型构建受挫。因此，提高序列与已识别结构之间比对的质量是很有必要的。我们利用人工神经网络（ANN）推导出一个替代矩阵，通过动态规划（DPANN：动态规划与人工神经网络相结合）来创建蛋白质序列与蛋白质结构之间的比对。该矩阵基于每个残基的氨基酸类型和二级结构状态。在一个具有相同折叠但缺乏序列相似性的蛋白质对数据库中，DPANN能将超过30%的所有序列与配对结构进行比对，与该对蛋白质的结构叠加非常相似。在超过一半的这些情况中，DPANN比对接近结构叠加，尽管折叠识别步骤中的初始比对并不接近。相反，在所有情况中只有10%的情况下，折叠识别过程中创建的比对优于DPANN。因此，在折叠识别后应用DPANN可显著提高比对准确性，这反过来又为蛋白质结构建模提供了更有用的模板。在人为使用探针蛋白的实际二级结构而非预测二级结构的情况下，超过50%的比对是成功的。