Suppr超能文献

大教堂:一种从多结构域蛋白质结构预测折叠和结构域边界的快速有效算法。

CATHEDRAL: a fast and effective algorithm to predict folds and domain boundaries from multidomain protein structures.

作者信息

Redfern Oliver C, Harrison Andrew, Dallman Tim, Pearl Frances M G, Orengo Christine A

机构信息

Department of Biochemistry and Molecular Biology, University College London, London, United Kingdom.

出版信息

PLoS Comput Biol. 2007 Nov;3(11):e232. doi: 10.1371/journal.pcbi.0030232.

Abstract

We present CATHEDRAL, an iterative protocol for determining the location of previously observed protein folds in novel multidomain protein structures. CATHEDRAL builds on the features of a fast secondary-structure-based method (using graph theory) to locate known folds within a multidomain context and a residue-based, double-dynamic programming algorithm, which is used to align members of the target fold groups against the query protein structure to identify the closest relative and assign domain boundaries. To increase the fidelity of the assignments, a support vector machine is used to provide an optimal scoring scheme. Once a domain is verified, it is excised, and the search protocol is repeated in an iterative fashion until all recognisable domains have been identified. We have performed an initial benchmark of CATHEDRAL against other publicly available structure comparison methods using a consensus dataset of domains derived from the CATH and SCOP domain classifications. CATHEDRAL shows superior performance in fold recognition and alignment accuracy when compared with many equivalent methods. If a novel multidomain structure contains a known fold, CATHEDRAL will locate it in 90% of cases, with <1% false positives. For nearly 80% of assigned domains in a manually validated test set, the boundaries were correctly delineated within a tolerance of ten residues. For the remaining cases, previously classified domains were very remotely related to the query chain so that embellishments to the core of the fold caused significant differences in domain sizes and manual refinement of the boundaries was necessary. To put this performance in context, a well-established sequence method based on hidden Markov models was only able to detect 65% of domains, with 33% of the subsequent boundaries assigned within ten residues. Since, on average, 50% of newly determined protein structures contain more than one domain unit, and typically 90% or more of these domains are already classified in CATH, CATHEDRAL will considerably facilitate the automation of protein structure classification.

摘要

我们提出了CATHEDRAL,这是一种用于在新型多结构域蛋白质结构中确定先前观察到的蛋白质折叠位置的迭代协议。CATHEDRAL基于一种基于二级结构的快速方法(使用图论)的特征,以在多结构域背景中定位已知折叠,以及一种基于残基的双动态规划算法,该算法用于将目标折叠组的成员与查询蛋白质结构进行比对,以识别最接近的亲属并确定结构域边界。为了提高分配的保真度,使用支持向量机提供最佳评分方案。一旦一个结构域得到验证,就将其切除,并以迭代方式重复搜索协议,直到所有可识别的结构域都被识别出来。我们使用从CATH和SCOP结构域分类中衍生的结构域共识数据集,对CATHEDRAL与其他公开可用的结构比较方法进行了初步基准测试。与许多等效方法相比,CATHEDRAL在折叠识别和比对准确性方面表现出卓越的性能。如果一个新型多结构域结构包含一个已知折叠,CATHEDRAL在90%的情况下能够定位到它,误报率小于1%。在一个经过人工验证的测试集中,近80%的已分配结构域的边界在十个残基的容差范围内被正确划定。对于其余情况,先前分类的结构域与查询链的关系非常遥远,以至于折叠核心部分的修饰导致结构域大小有显著差异,因此需要手动细化边界。为了说明这种性能,一种基于隐马尔可夫模型的成熟序列方法只能检测到65%的结构域,随后33%的边界在十个残基内被分配。由于平均而言,新确定的蛋白质结构中有50%包含不止一个结构域单元,并且这些结构域中通常90%或更多已经在CATH中分类,CATHEDRAL将极大地促进蛋白质结构分类的自动化。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d5b/2098860/417bedb28a7d/pcbi.0030232.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验