Fechteler T, Dengler U, Schomburg D
GBF (Gesellschaft für Biotechnologische Forschung) Department of Molecular Structure Research, Braunschweig, Germany.
J Mol Biol. 1995 Oct 13;253(1):114-31. doi: 10.1006/jmbi.1995.0540.
The prediction of protein structure in insertion/deletion regions (referred to as indels) is an important part of protein model building by homology. Here we combine cluster analysis with data base search procedures. Initially, data bases of representative protein fragments are constructed using two different clustering algorithms. In the HCAPD (hierarchical clustering after preliminary division) approach, all protein fragments are divided into classes with similar anchor region structures (a protein fragment consists of two anchoring regions and a central region). Within these classes the fragments are further clustered using a hierarchical cluster algorithm. The DCANN (deterministic clustering by assignment of all nearest neighbours) approach is a variant of the k-nearest neighbours cluster algorithm. Only geometric scoring criteria are used for data base searching. The main advantage of a non-redundant data base is the ability to provide structurally different fragments during the search process, which leads to an improvement in structure prediction. Both methods have been tested on 71 insertions and 74 deletions with lengths between one and eight residues.
插入/缺失区域(简称为插入缺失)中蛋白质结构的预测是同源性蛋白质模型构建的重要组成部分。在此,我们将聚类分析与数据库搜索程序相结合。首先,使用两种不同的聚类算法构建代表性蛋白质片段数据库。在HCAPD(初步划分后的层次聚类)方法中,所有蛋白质片段被分为具有相似锚定区域结构的类别(一个蛋白质片段由两个锚定区域和一个中心区域组成)。在这些类别中,使用层次聚类算法对片段进行进一步聚类。DCANN(通过分配所有最近邻进行确定性聚类)方法是k近邻聚类算法的一个变体。在数据库搜索中仅使用几何评分标准。非冗余数据库的主要优点是能够在搜索过程中提供结构不同的片段,这导致结构预测得到改进。这两种方法都已在71个插入和74个缺失上进行了测试,其长度在1至8个残基之间。