Suppr超能文献

一种用于构建蛋白质结构中环的超二级结构库及搜索算法。

A supersecondary structure library and search algorithm for modeling loops in protein structures.

作者信息

Fernandez-Fuentes Narcis, Oliva Baldomero, Fiser András

机构信息

Department of Biochemistry and Seaver Foundation Center for Bioinformatics, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA.

出版信息

Nucleic Acids Res. 2006 Apr 14;34(7):2085-97. doi: 10.1093/nar/gkl156. Print 2006.

Abstract

We present a fragment-search based method for predicting loop conformations in protein models. A hierarchical and multidimensional database has been set up that currently classifies 105,950 loop fragments and loop flanking secondary structures. Besides the length of the loops and types of bracing secondary structures the database is organized along four internal coordinates, a distance and three types of angles characterizing the geometry of stem regions. Candidate fragments are selected from this library by matching the length, the types of bracing secondary structures of the query and satisfying the geometrical restraints of the stems and subsequently inserted in the query protein framework where their fit is assessed by the root mean square deviation (r.m.s.d.) of stem regions and by the number of rigid body clashes with the environment. In the final step remaining candidate loops are ranked by a Z-score that combines information on sequence similarity and fit of predicted and observed phi/psi main chain dihedral angle propensities. Confidence Z-score cut-offs were determined for each loop length that identify those predicted fragments that outperform a competitive ab initio method. A web server implements the method, regularly updates the fragment library and performs prediction. Predicted segments are returned, or optionally, these can be completed with side chain reconstruction and subsequently annealed in the environment of the query protein by conjugate gradient minimization. The prediction method was tested on artificially prepared search datasets where all trivial sequence similarities on the SCOP superfamily level were removed. Under these conditions it is possible to predict loops of length 4, 8 and 12 with coverage of 98, 78 and 28% with at least of 0.22, 1.38 and 2.47 A of r.m.s.d. accuracy, respectively. In a head-to-head comparison on loops extracted from freshly deposited new protein folds the current method outperformed in a approximately 5:1 ratio an earlier developed database search method.

摘要

我们提出了一种基于片段搜索的方法来预测蛋白质模型中的环构象。已建立了一个分层多维数据库,目前对105,950个环片段和环侧翼二级结构进行分类。除了环的长度和支撑二级结构的类型外,该数据库还按照四个内部坐标进行组织,一个距离和三种角度类型,这些坐标表征了茎区域的几何形状。通过匹配查询的长度、支撑二级结构的类型并满足茎的几何约束,从该库中选择候选片段,随后将其插入查询蛋白质框架中,通过茎区域的均方根偏差(r.m.s.d.)以及与环境的刚体冲突数量来评估它们的拟合度。在最后一步中,通过结合序列相似性信息以及预测和观察到的phi/psi主链二面角倾向的拟合度的Z分数对剩余的候选环进行排名。为每个环长度确定置信Z分数截止值,以识别那些优于竞争性从头算方法的预测片段。一个网络服务器实现了该方法,定期更新片段库并进行预测。返回预测的片段,或者可选地,可以通过侧链重建来完成这些片段,随后通过共轭梯度最小化在查询蛋白质的环境中进行退火。该预测方法在人工准备的搜索数据集上进行了测试,其中去除了SCOP超家族水平上所有平凡的序列相似性。在这些条件下,可以分别预测长度为4、8和12的环,覆盖率分别为98%、78%和28%,r.m.s.d.精度至少为0.22、1.38和2.47 Å。在对从新沉积的新蛋白质折叠中提取的环进行的直接比较中,当前方法以大约5:1的比例优于早期开发的数据库搜索方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/238e/1440879/7b51d97f72d1/gkl156f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验