Suppr超能文献

基于模板的蛋白质序列相似性的午夜区和黎明区(黄昏区)内蛋白质折叠识别

Template-based recognition of protein fold within the midnight and twilight zones of protein sequence similarity.

作者信息

Pirun Mono, Babnigg Gyorgy, Stevens Fred J

机构信息

Department of Bioengineering, University of Illinois at Chicago, 60607, USA.

出版信息

J Mol Recognit. 2005 May-Jun;18(3):203-12. doi: 10.1002/jmr.728.

Abstract

Most homologous pairs of proteins have no significant sequence similarity to each other and are not identified by direct sequence comparison or profile-based strategies. However, multiple sequence alignments of low similarity homologues typically reveal a limited number of positions that are well conserved despite diversity of function. It may be inferred that conservation at most of these positions is the result of the importance of the contribution of these amino acids to the folding and stability of the protein. As such, these amino acids and their relative positions may define a structural signature. We demonstrate that extraction of this fold template provides the basis for the sequence database to be searched for patterns consistent with the fold, enabling identification of homologs that are not recognized by global sequence analysis. The fold template method was developed to address the need for a tool that could comprehensively search the midnight and twilight zones of protein sequence similarity without reliance on global statistical significance. Manual implementations of the fold template method were performed on three folds--immunoglobulin, c-lectin and TIM barrel. Following proof of concept of the template method, an automated version of the approach was developed. This automated fold template method was used to develop fold templates for 10 of the more populated folds in the SCOP database. The fold template method developed three-dimensional structural motifs or signatures that were able to return a diverse collection of proteins, while maintaining a low false positive rate. Although the results of the manual fold template method were more comprehensive than the automated fold template method, the diversity of the results from the automated fold template method surpassed those of current methods that rely on statistical significance to infer evolutionary relationships among divergent proteins.

摘要

大多数蛋白质同源对彼此之间没有显著的序列相似性,无法通过直接的序列比较或基于轮廓的策略来识别。然而,低相似性同源物的多序列比对通常会揭示出有限数量的位置,尽管功能多样,但这些位置却高度保守。可以推断,这些位置上大多数的保守性是由于这些氨基酸对蛋白质折叠和稳定性的贡献至关重要。因此,这些氨基酸及其相对位置可能定义了一种结构特征。我们证明,提取这种折叠模板为在序列数据库中搜索与该折叠一致的模式提供了基础,从而能够识别全局序列分析无法识别的同源物。折叠模板方法的开发是为了满足对一种工具的需求,该工具能够全面搜索蛋白质序列相似性的午夜区和黄昏区,而不依赖全局统计显著性。折叠模板方法的手动实现应用于三种折叠结构——免疫球蛋白、C型凝集素和TIM桶状结构。在验证了模板方法的概念之后,开发了该方法的自动化版本。这种自动化的折叠模板方法被用于为SCOP数据库中10种丰度较高的折叠结构开发折叠模板。折叠模板方法开发出的三维结构基序或特征能够返回各种不同的蛋白质集合,同时保持较低的假阳性率。虽然手动折叠模板方法的结果比自动化折叠模板方法更全面,但自动化折叠模板方法的结果多样性超过了目前依靠统计显著性来推断分歧蛋白质之间进化关系的方法。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验