使用共识方法对蛋白质结构进行结构域分配：表征与分析

Domain assignment for protein structures using a consensus approach: characterization and analysis.

作者信息

Jones S, Stewart M, Michie A, Swindells M B, Orengo C, Thornton J M

机构信息

Department of Biochemistry and Molecular Biology, University College, London, United Kingdom.

出版信息

Protein Sci. 1998 Feb;7(2):233-42. doi: 10.1002/pro.5560070202.

DOI:10.1002/pro.5560070202

PMID:9521098

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2143930/

Abstract

A consensus approach for the assignment of structural domains in proteins is presented. The approach combines a number of previously published algorithms, and takes advantage of the elevated accuracy obtained when assignments from the individual algorithms are in agreement. The consensus approach is tested on a data set of 55 protein chains, for which domain assignments from four automated methods were known, and for which crystallographers assignments had been reported in the literature. Accuracy was found to increase in this test from 72% using individual algorithms to 100% when all four methods were in agreement. However a consensus prediction using all four methods was only possible for 52% of the dataset. The consensus approach [using three publicly available domain assignment algorithms (PUU, DETECTIVE, DOMAK)] was then used to make domain assignments for a data set of 787 protein chains from the Protein Data Bank. Analysis of the assignments showed 55.7% of assignments could be made automatically, and of these, 13.5% were multi-domain proteins. Of the remaining 44.3% that could not be assigned by the consensus procedure 90.4% had their domain boundaries assigned correctly by at least one of the algorithms. Once identified, these domains were analyzed for trends in their size and secondary structure class. In addition, the discontinuity of each domain along the protein chain was considered.

摘要

本文提出了一种用于蛋白质结构域分配的共识方法。该方法结合了许多先前发表的算法，并利用了当各个算法的分配结果一致时所获得的更高准确性。在一个包含55条蛋白质链的数据集上对该共识方法进行了测试，已知该数据集的四种自动化方法的结构域分配情况，并且文献中已报道了晶体学家的分配结果。在该测试中发现，使用单个算法时的准确率为72%，而当所有四种方法都一致时，准确率提高到了100%。然而，对于该数据集的52%，仅使用所有四种方法进行共识预测才是可能的。然后，使用共识方法[使用三种公开可用的结构域分配算法（PUU、DETECTIVE、DOMAK）]对来自蛋白质数据库的787条蛋白质链的数据集进行结构域分配。对这些分配结果的分析表明，55.7%的分配可以自动完成，其中13.5%是多结构域蛋白质。在其余44.3%无法通过共识程序进行分配的情况中，90.4%的结构域边界至少被一种算法正确分配。一旦确定，就对这些结构域的大小和二级结构类别趋势进行分析。此外，还考虑了每个结构域沿蛋白质链的不连续性。

相似文献

Domain assignment for protein structures using a consensus approach: characterization and analysis.使用共识方法对蛋白质结构进行结构域分配：表征与分析

Protein Sci. 1998 Feb;7(2):233-42. doi: 10.1002/pro.5560070202.

Toward consistent assignment of structural domains in proteins.迈向蛋白质结构域的一致分配

J Mol Biol. 2004 Jun 4;339(3):647-78. doi: 10.1016/j.jmb.2004.03.053.

Partitioning protein structures into domains: why is it so difficult?将蛋白质结构划分为结构域：为何如此困难？

J Mol Biol. 2006 Aug 18;361(3):562-90. doi: 10.1016/j.jmb.2006.05.060. Epub 2006 Jun 22.

An automatic method involving cluster analysis of secondary structures for the identification of domains in proteins.一种涉及对二级结构进行聚类分析以识别蛋白质结构域的自动化方法。

Protein Sci. 1995 Mar;4(3):506-20. doi: 10.1002/pro.5560040317.

dConsensus: a tool for displaying domain assignments by multiple structure-based algorithms and for construction of a consensus assignment.共识：一种用于显示基于多种结构算法的结构域分配的工具，以及一种用于构建共识分配的工具。

BMC Bioinformatics. 2010 Jun 9;11:310. doi: 10.1186/1471-2105-11-310.

Identification of structural domains in proteins by a graph heuristic.通过图启发式方法鉴定蛋白质中的结构域

Proteins. 1999 May 15;35(3):338-52.

Protein structural domain parsing by consensus reasoning over multiple knowledge sources and methods.通过对多种知识来源和方法进行共识推理来解析蛋白质结构域

Stud Health Technol Inform. 2001;84(Pt 2):965-9.

CRAACK: consensus program for NMR amino acid type assignment.

J Chem Inf Model. 2006 May-Jun;46(3):1517-22. doi: 10.1021/ci050092h.

Knowledge-based protein secondary structure assignment.基于知识的蛋白质二级结构预测

Proteins. 1995 Dec;23(4):566-79. doi: 10.1002/prot.340230412.

Inferring boundary information of discontinuous-domain proteins.推断不连续结构域蛋白质的边界信息。

IEEE Trans Nanobioscience. 2008 Sep;7(3):200-5. doi: 10.1109/TNB.2008.2002283.

引用本文的文献

Hierarchical Analysis of Protein Structures: From Secondary Structures to Protein Units and Domains.蛋白质结构的层次分析：从二级结构到蛋白质单元和结构域。

Methods Mol Biol. 2025;2870:357-370. doi: 10.1007/978-1-0716-4213-9_18.

Protein ensemble modeling and analysis with MMMx.使用 MMMx 进行蛋白质整体建模与分析。

Protein Sci. 2024 Mar;33(3):e4906. doi: 10.1002/pro.4906.

Folding pathway of a discontinuous two-domain protein.无规则双域蛋白的折叠途径。

Nat Commun. 2024 Jan 23;15(1):690. doi: 10.1038/s41467-024-44901-3.

Assignment of structural domains in proteins using diffusion kernels on graphs.使用图上的扩散核来分配蛋白质中的结构域。

BMC Bioinformatics. 2022 Sep 8;23(1):369. doi: 10.1186/s12859-022-04902-9.

Reaching alignment-profile-based accuracy in predicting protein secondary and tertiary structural properties without alignment.无需对齐即可达到基于对齐轮廓的预测蛋白质二级和三级结构性质的准确性。

Sci Rep. 2022 May 9;12(1):7607. doi: 10.1038/s41598-022-11684-w.

Massively parallel interrogation of protein fragment secretability using SECRiFY reveals features influencing secretory system transit.使用 SECRiFY 大规模平行检测蛋白质片段分泌能力揭示影响分泌系统转运的特征。

Nat Commun. 2021 Nov 5;12(1):6414. doi: 10.1038/s41467-021-26720-y.

Density Peak clustering of protein sequences associated to a Pfam clan reveals clear similarities and interesting differences with respect to manual family annotation.蛋白质序列的密度峰值聚类与 Pfam 家族相关，与手动家族注释相比，揭示了明显的相似性和有趣的差异。

BMC Bioinformatics. 2021 Mar 12;22(1):121. doi: 10.1186/s12859-021-04013-x.

Converting a Periplasmic Binding Protein into a Synthetic Biosensing Switch through Domain Insertion.通过结构域插入将周质结合蛋白转化为合成生物传感开关。

Biomed Res Int. 2019 Jan 3;2019:4798793. doi: 10.1155/2019/4798793. eCollection 2019.

An ambiguity principle for assigning protein structural domains.一种用于分配蛋白质结构域的不明确性原理。

Sci Adv. 2017 Jan 13;3(1):e1600552. doi: 10.1126/sciadv.1600552. eCollection 2017 Jan.

Widespread Expansion of Protein Interaction Capabilities by Alternative Splicing.通过可变剪接实现蛋白质相互作用能力的广泛扩展

Cell. 2016 Feb 11;164(4):805-17. doi: 10.1016/j.cell.2016.01.029.

本文引用的文献

A database of globular protein structural domains: clustering of representative family members into similar folds.球状蛋白质结构域数据库：将代表性家族成员聚类为相似折叠结构。

Fold Des. 1996;1(3):209-20. doi: 10.1016/S1359-0278(96)00032-6.

Identification and analysis of domains in proteins.蛋白质中结构域的鉴定与分析。

Protein Eng. 1995 Jun;8(6):513-25. doi: 10.1093/protein/8.6.513.

Crystal structure of adenylosuccinate synthetase from Escherichia coli. Evidence for convergent evolution of GTP-binding domains.

J Biol Chem. 1993 Dec 5;268(34):25334-42.

Binary discontinuous compact protein domains.二元不连续紧密蛋白结构域

Protein Eng. 1994 Mar;7(3):335-40. doi: 10.1093/protein/7.3.335.

The FSSP database of structurally aligned protein fold families.结构比对蛋白质折叠家族的FSSP数据库。

Nucleic Acids Res. 1994 Sep;22(17):3600-9.

Protein Sci. 1995 Mar;4(3):506-20. doi: 10.1002/pro.5560040317.

Three-dimensional structure of bacterial luciferase from Vibrio harveyi at 2.4 A resolution.哈氏弧菌细菌荧光素酶在2.4埃分辨率下的三维结构。

Biochemistry. 1995 May 23;34(20):6581-6. doi: 10.1021/bi00020a002.

SCOP: a structural classification of proteins database for the investigation of sequences and structures.SCOP：用于序列和结构研究的蛋白质数据库结构分类

J Mol Biol. 1995 Apr 7;247(4):536-40. doi: 10.1006/jmbi.1995.0159.

A common protein fold and similar active site in two distinct families of beta-glycanases.

Nat Struct Biol. 1995 Jul;2(7):569-76. doi: 10.1038/nsb0795-569.

Continuous and discontinuous domains: an algorithm for the automatic generation of reliable protein domain definitions.连续和不连续结构域：一种自动生成可靠蛋白质结构域定义的算法

Protein Sci. 1995 May;4(5):872-84. doi: 10.1002/pro.5560040507.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。