Koch I, Lengauer T
Institute for Algorithms and Scientific Computing, GMD-German National Research Center for Information Technology, St. Augustin, Germany.
Proc Int Conf Intell Syst Mol Biol. 1997;5:167-78.
We introduce a method for finding weak structural similarities in a set of protein structures. Proteins are considered at their secondary structure level. The method uses a rigorous graph-theoretical algorithm which finds all structural similarities. Protein structures are modelled as undirected labelled graphs, the so-called protein graphs. We suggest that for detecting the similarities between two protein structures it is sufficient to find similarities in the protein core which consists of tightly packed secondary structure elements. Therefore, we can restrict ourselves to solving the maximal common connected subgraph problem instead of the maximal common subgraph problem. We have modified the algorithm by Bron and Kerbosch for solving that problem. The speed of the algorithm increases drastically. After calculating all maximal common connected substructures for all pairwise comparisons in a set of protein graphs the common substructure in all proteins can be calculated by intersecting them. In this paper we characterize the method briefly and explain the modelling of the protein structure in detail. For the pairwise alignment the similarity of porin (1OMF) with bacteriochlorophyll a (3BCL) and BirA protein (1BIB) with DNA polymerase III (2POL) will be discussed. In the case of the multiple structure alignment the similarity in variants of four phosphatases and in subtilisin Carlsberg, carboxypeptidase, elongation factor Tu, and flavodoxin will be represented. Our first experiments show that the method works correctly and fast. The method can be used for arbitrary graphs. Thus, different graph-theoretical models of protein structures can be examined.
我们介绍了一种在一组蛋白质结构中寻找弱结构相似性的方法。蛋白质是在其二级结构水平上进行考虑的。该方法使用一种严格的图论算法来找出所有的结构相似性。蛋白质结构被建模为无向标记图,即所谓的蛋白质图。我们认为,为了检测两个蛋白质结构之间的相似性,在由紧密堆积的二级结构元件组成的蛋白质核心中找到相似性就足够了。因此,我们可以将自己限制在解决最大公共连通子图问题,而不是最大公共子图问题。我们对Bron和Kerbosch的算法进行了修改以解决该问题。算法的速度大幅提高。在计算一组蛋白质图中所有成对比较的所有最大公共连通子结构之后,可以通过将它们相交来计算所有蛋白质中的公共子结构。在本文中,我们简要描述了该方法,并详细解释了蛋白质结构的建模。对于成对比对,将讨论孔蛋白(1OMF)与细菌叶绿素a(3BCL)以及BirA蛋白(1BIB)与DNA聚合酶III(2POL)的相似性。在多结构比对的情况下,将展示四种磷酸酶变体以及枯草杆菌蛋白酶卡尔伯格、羧肽酶、延伸因子Tu和黄素氧还蛋白中的相似性。我们的首次实验表明该方法运行正确且快速。该方法可用于任意图形。因此,可以研究蛋白质结构的不同图论模型。