一种用于识别蛋白质结构域的拓扑算法。

A topological algorithm for identification of structural domains of proteins.

作者信息

Emmert-Streib Frank, Mushegian Arcady

机构信息

Stowers Institute for Medical Research, Kansas City, MO 64110, USA.

出版信息

BMC Bioinformatics. 2007 Jul 3;8:237. doi: 10.1186/1471-2105-8-237.

DOI:10.1186/1471-2105-8-237

PMID:17608939

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1933582/

Abstract

BACKGROUND

Identification of the structural domains of proteins is important for our understanding of the organizational principles and mechanisms of protein folding, and for insights into protein function and evolution. Algorithmic methods of dissecting protein of known structure into domains developed so far are based on an examination of multiple geometrical, physical and topological features. Successful as many of these approaches are, they employ a lot of heuristics, and it is not clear whether they illuminate any deep underlying principles of protein domain organization. Other well-performing domain dissection methods rely on comparative sequence analysis. These methods are applicable to sequences with known and unknown structure alike, and their success highlights a fundamental principle of protein modularity, but this does not directly improve our understanding of protein spatial structure.

RESULTS

We present a novel graph-theoretical algorithm for the identification of domains in proteins with known three-dimensional structure. We represent the protein structure as an undirected, unweighted and unlabeled graph whose nodes correspond to the secondary structure elements and edges represent physical proximity of at least one pair of alpha carbon atoms from two elements. Domains are identified as constrained partitions of the graph, corresponding to sets of vertices obtained by the maximization of the cycle distributions found in the graph. When a partition is found, the algorithm is iteratively applied to each of the resulting subgraphs. The decision to accept or reject a tentative cut position is based on a specific classifier. The algorithm is applied iteratively to each of the resulting subgraphs and terminates automatically if partitions are no longer accepted. The distribution of cycles is the only type of information on which the decision about protein dissection is based. Despite the barebone simplicity of the approach, our algorithm approaches the best heuristic algorithms in accuracy.

CONCLUSION

Our graph-theoretical algorithm uses only topological information present in the protein structure itself to find the domains and does not rely on any geometrical or physical information about protein molecule. Perhaps unexpectedly, these drastic constraints on resources, which result in a seemingly approximate description of protein structures and leave only a handful of parameters available for analysis, do not lead to any significant deterioration of algorithm accuracy. It appears that protein structures can be rigorously treated as topological rather than geometrical objects and that the majority of information about protein domains can be inferred from the coarse-grained measure of pairwise proximity between elements of secondary structure elements.

摘要

背景

蛋白质结构域的识别对于我们理解蛋白质折叠的组织原则和机制，以及洞察蛋白质功能和进化至关重要。到目前为止，将已知结构的蛋白质解析为结构域的算法方法是基于对多种几何、物理和拓扑特征的考察。尽管这些方法中有许多很成功，但它们采用了大量启发式方法，而且不清楚它们是否揭示了蛋白质结构域组织的任何深层次基本原理。其他性能良好的结构域解析方法依赖于比较序列分析。这些方法同样适用于结构已知和未知的序列，它们的成功凸显了蛋白质模块化的一个基本原理，但这并不能直接增进我们对蛋白质空间结构的理解。

结果

我们提出了一种用于识别具有已知三维结构的蛋白质中结构域的新型图论算法。我们将蛋白质结构表示为一个无向、无权且无标记的图，其节点对应于二级结构元件，边表示来自两个元件的至少一对α碳原子的物理邻近性。结构域被识别为图的受限划分，对应于通过最大化图中发现的循环分布而获得的顶点集。找到一个划分后，该算法会迭代地应用于每个生成的子图。接受或拒绝一个暂定切割位置的决策基于一个特定的分类器。该算法会迭代地应用于每个生成的子图，如果不再接受划分则自动终止。循环分布是关于蛋白质切割决策所基于的唯一信息类型。尽管该方法极其简单，但我们的算法在准确性上接近最佳启发式算法。

结论

我们的图论算法仅使用蛋白质结构本身中存在的拓扑信息来寻找结构域，而不依赖于关于蛋白质分子的任何几何或物理信息。也许出乎意料的是，这些对资源的严格限制，导致对蛋白质结构的描述看似近似，并且仅留下少数几个参数可供分析，但这并未导致算法准确性的任何显著下降。看来蛋白质结构可以被严格地视为拓扑对象而非几何对象，并且关于蛋白质结构域的大部分信息可以从二级结构元件之间成对邻近性的粗粒度度量中推断出来。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd37/1933582/22222c1b44e0/1471-2105-8-237-1.jpg

相似文献

A topological algorithm for identification of structural domains of proteins.一种用于识别蛋白质结构域的拓扑算法。

BMC Bioinformatics. 2007 Jul 3;8:237. doi: 10.1186/1471-2105-8-237.

CATHEDRAL: a fast and effective algorithm to predict folds and domain boundaries from multidomain protein structures.大教堂：一种从多结构域蛋白质结构预测折叠和结构域边界的快速有效算法。

PLoS Comput Biol. 2007 Nov;3(11):e232. doi: 10.1371/journal.pcbi.0030232.

Putracer: a novel method for identification of continuous-domains in multi-domain proteins.Putracer：一种鉴定多结构域蛋白质中连续结构域的新方法。

J Bioinform Comput Biol. 2013 Feb;11(1):1340012. doi: 10.1142/S021972001340012X.

Prediction of protein coarse contact maps.蛋白质粗略接触图的预测

J Bioinform Comput Biol. 2003 Jul;1(2):411-31. doi: 10.1142/s0219720003000149.

A simple genetic algorithm for the optimization of multidomain protein homology models driven by NMR residual dipolar coupling and small angle X-ray scattering data.一种用于优化由核磁共振残余偶极耦合和小角X射线散射数据驱动的多结构域蛋白质同源模型的简单遗传算法。

Eur Biophys J. 2007 Dec;37(1):95-104. doi: 10.1007/s00249-007-0170-2. Epub 2007 May 24.

Automatic classification of protein structures relying on similarities between alignments.基于比对间相似性的蛋白质结构自动分类。

BMC Bioinformatics. 2012 Sep 14;13:233. doi: 10.1186/1471-2105-13-233.

Recognizing the fold of a protein structure.识别蛋白质结构的折叠。

Bioinformatics. 2003 Sep 22;19(14):1748-59. doi: 10.1093/bioinformatics/btg240.

Analyzing the simplicial decomposition of spatial protein structures.分析空间蛋白质结构的单纯形分解。

BMC Bioinformatics. 2008;9 Suppl 1(Suppl 1):S11. doi: 10.1186/1471-2105-9-S1-S11.

High-throughput 3D structural homology detection via NMR resonance assignment.通过核磁共振共振归属进行高通量3D结构同源性检测。

Proc IEEE Comput Syst Bioinform Conf. 2004:278-89.

A graph-theoretic approach for the separation of b and y ions in tandem mass spectra.一种用于串联质谱中b离子和y离子分离的图论方法。

Bioinformatics. 2005 Mar 1;21(5):563-74. doi: 10.1093/bioinformatics/bti044. Epub 2004 Sep 28.

引用本文的文献

Assignment of structural domains in proteins using diffusion kernels on graphs.使用图上的扩散核来分配蛋白质中的结构域。

BMC Bioinformatics. 2022 Sep 8;23(1):369. doi: 10.1186/s12859-022-04902-9.

Elucidating Self-Assembling Peptide Aggregation via Morphoscanner: A New Tool for Protein-Peptide Structural Characterization.通过形态扫描仪阐明自组装肽聚集：一种用于蛋白质-肽结构表征的新工具。

Adv Sci (Weinh). 2018 Jun 22;5(8):1800471. doi: 10.1002/advs.201800471. eCollection 2018 Aug.

Identifying structural domains of proteins using clustering.利用聚类识别蛋白质的结构域。

BMC Bioinformatics. 2012 Nov 1;13:286. doi: 10.1186/1471-2105-13-286.

Limitations of gene duplication models: evolution of modules in protein interaction networks.基因复制模型的局限性：蛋白质相互作用网络中模块的进化。

PLoS One. 2012;7(4):e35531. doi: 10.1371/journal.pone.0035531. Epub 2012 Apr 18.

Protein structural modularity and robustness are associated with evolvability.蛋白质结构的模块化和稳健性与可进化性相关。

Genome Biol Evol. 2011;3:456-75. doi: 10.1093/gbe/evr046. Epub 2011 May 21.

Optimal contact definition for reconstruction of contact maps.最佳接触定义用于重建接触图谱。

BMC Bioinformatics. 2010 May 27;11:283. doi: 10.1186/1471-2105-11-283.

Predicting cell cycle regulated genes by causal interactions.通过因果关系预测细胞周期调控基因。

PLoS One. 2009 Aug 18;4(8):e6633. doi: 10.1371/journal.pone.0006633.

Hierarchical coordination of periodic genes in the cell cycle of Saccharomyces cerevisiae.酿酒酵母细胞周期中周期性基因的层级协调

BMC Syst Biol. 2009 Jul 20;3:76. doi: 10.1186/1752-0509-3-76.

本文引用的文献

Algorithmic computation of knot polynomials of secondary structure elements of proteins.蛋白质二级结构元件纽结多项式的算法计算。

J Comput Biol. 2006 Oct;13(8):1503-12. doi: 10.1089/cmb.2006.13.1503.

Partitioning protein structures into domains: why is it so difficult?将蛋白质结构划分为结构域：为何如此困难？

J Mol Biol. 2006 Aug 18;361(3):562-90. doi: 10.1016/j.jmb.2006.05.060. Epub 2006 Jun 22.

The impact of structural genomics: expectations and outcomes.结构基因组学的影响：期望与成果

Science. 2006 Jan 20;311(5759):347-51. doi: 10.1126/science.1121018.

J Comput Biol. 2005 Jul-Aug;12(6):609-37. doi: 10.1089/cmb.2005.12.609.

Prediction of protein interdomain linker regions by a hidden Markov model.利用隐马尔可夫模型预测蛋白质结构域间连接区域

Bioinformatics. 2005 May 15;21(10):2264-70. doi: 10.1093/bioinformatics/bti363. Epub 2005 Mar 3.

InterPro, progress and status in 2005.InterPro 2005年的进展与现状

Nucleic Acids Res. 2005 Jan 1;33(Database issue):D201-5. doi: 10.1093/nar/gki106.

Toward consistent assignment of structural domains in proteins.迈向蛋白质结构域的一致分配

J Mol Biol. 2004 Jun 4;339(3):647-78. doi: 10.1016/j.jmb.2004.03.053.

SCOP database in 2004: refinements integrate structure and sequence family data.2004年的SCOP数据库：改进整合了结构和序列家族数据。

Nucleic Acids Res. 2004 Jan 1;32(Database issue):D226-9. doi: 10.1093/nar/gkh039.

The ASTRAL Compendium in 2004.2004年的《星盘汇编》。

Nucleic Acids Res. 2004 Jan 1;32(Database issue):D189-92. doi: 10.1093/nar/gkh034.

Improving the performance of DomainParser for structural domain partition using neural network.使用神经网络提高用于结构域划分的DomainParser的性能。

Nucleic Acids Res. 2003 Feb 1;31(3):944-52. doi: 10.1093/nar/gkg189.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于识别蛋白质结构域的拓扑算法。

A topological algorithm for identification of structural domains of proteins.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献