Suppr超能文献

一种用于识别蛋白质结构域的拓扑算法。

A topological algorithm for identification of structural domains of proteins.

作者信息

Emmert-Streib Frank, Mushegian Arcady

机构信息

Stowers Institute for Medical Research, Kansas City, MO 64110, USA.

出版信息

BMC Bioinformatics. 2007 Jul 3;8:237. doi: 10.1186/1471-2105-8-237.

Abstract

BACKGROUND

Identification of the structural domains of proteins is important for our understanding of the organizational principles and mechanisms of protein folding, and for insights into protein function and evolution. Algorithmic methods of dissecting protein of known structure into domains developed so far are based on an examination of multiple geometrical, physical and topological features. Successful as many of these approaches are, they employ a lot of heuristics, and it is not clear whether they illuminate any deep underlying principles of protein domain organization. Other well-performing domain dissection methods rely on comparative sequence analysis. These methods are applicable to sequences with known and unknown structure alike, and their success highlights a fundamental principle of protein modularity, but this does not directly improve our understanding of protein spatial structure.

RESULTS

We present a novel graph-theoretical algorithm for the identification of domains in proteins with known three-dimensional structure. We represent the protein structure as an undirected, unweighted and unlabeled graph whose nodes correspond to the secondary structure elements and edges represent physical proximity of at least one pair of alpha carbon atoms from two elements. Domains are identified as constrained partitions of the graph, corresponding to sets of vertices obtained by the maximization of the cycle distributions found in the graph. When a partition is found, the algorithm is iteratively applied to each of the resulting subgraphs. The decision to accept or reject a tentative cut position is based on a specific classifier. The algorithm is applied iteratively to each of the resulting subgraphs and terminates automatically if partitions are no longer accepted. The distribution of cycles is the only type of information on which the decision about protein dissection is based. Despite the barebone simplicity of the approach, our algorithm approaches the best heuristic algorithms in accuracy.

CONCLUSION

Our graph-theoretical algorithm uses only topological information present in the protein structure itself to find the domains and does not rely on any geometrical or physical information about protein molecule. Perhaps unexpectedly, these drastic constraints on resources, which result in a seemingly approximate description of protein structures and leave only a handful of parameters available for analysis, do not lead to any significant deterioration of algorithm accuracy. It appears that protein structures can be rigorously treated as topological rather than geometrical objects and that the majority of information about protein domains can be inferred from the coarse-grained measure of pairwise proximity between elements of secondary structure elements.

摘要

背景

蛋白质结构域的识别对于我们理解蛋白质折叠的组织原则和机制,以及洞察蛋白质功能和进化至关重要。到目前为止,将已知结构的蛋白质解析为结构域的算法方法是基于对多种几何、物理和拓扑特征的考察。尽管这些方法中有许多很成功,但它们采用了大量启发式方法,而且不清楚它们是否揭示了蛋白质结构域组织的任何深层次基本原理。其他性能良好的结构域解析方法依赖于比较序列分析。这些方法同样适用于结构已知和未知的序列,它们的成功凸显了蛋白质模块化的一个基本原理,但这并不能直接增进我们对蛋白质空间结构的理解。

结果

我们提出了一种用于识别具有已知三维结构的蛋白质中结构域的新型图论算法。我们将蛋白质结构表示为一个无向、无权且无标记的图,其节点对应于二级结构元件,边表示来自两个元件的至少一对α碳原子的物理邻近性。结构域被识别为图的受限划分,对应于通过最大化图中发现的循环分布而获得的顶点集。找到一个划分后,该算法会迭代地应用于每个生成的子图。接受或拒绝一个暂定切割位置的决策基于一个特定的分类器。该算法会迭代地应用于每个生成的子图,如果不再接受划分则自动终止。循环分布是关于蛋白质切割决策所基于的唯一信息类型。尽管该方法极其简单,但我们的算法在准确性上接近最佳启发式算法。

结论

我们的图论算法仅使用蛋白质结构本身中存在的拓扑信息来寻找结构域,而不依赖于关于蛋白质分子的任何几何或物理信息。也许出乎意料的是,这些对资源的严格限制,导致对蛋白质结构的描述看似近似,并且仅留下少数几个参数可供分析,但这并未导致算法准确性的任何显著下降。看来蛋白质结构可以被严格地视为拓扑对象而非几何对象,并且关于蛋白质结构域的大部分信息可以从二级结构元件之间成对邻近性的粗粒度度量中推断出来。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd37/1933582/22222c1b44e0/1471-2105-8-237-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验