Suppr超能文献

利用残基标签上的哈希和几何过滤进行结构基序匹配以预测蛋白质功能。

Matching of structural motifs using hashing on residue labels and geometric filtering for protein function prediction.

作者信息

Moll Mark, Kavraki Lydia E

机构信息

Department of Computer Science, Rice University, Houston, TX 77005, USA.

出版信息

Comput Syst Bioinformatics Conf. 2008;7:157-68.

Abstract

There is an increasing number of proteins with known structure but unknown function. Determining their function would have a significant impact on understanding diseases and designing new therapeutics. However, experimental protein function determination is expensive and very time-consuming. Computational methods can facilitate function determination by identifying proteins that have high structural and chemical similarity. Our focus is on methods that determine binding site similarity. Although several such methods exist, it still remains a challenging problem to quickly find all functionally-related matches for structural motifs in large data sets with high specificity. In this context, a structural motif is a set of 3D points annotated with physicochemical information that characterize a molecular function. We propose a new method called LabelHash that creates hash tables of n-tuples of residues for a set of targets. Using these hash tables, we can quickly look up partial matches to a motif and expand those matches to complete matches. We show that by applying only very mild geometric constraints we can find statistically significant matches with extremely high specificity in very large data sets and for very general structural motifs. We demonstrate that our method requires a reasonable amount of storage when employing a simple geometric filter and further improves on the specificity of our previous work while maintaining very high sensitivity. Our algorithm is evaluated on 20 homolog classes and a non-redundant version of the Protein Data Bank as our background data set. We use cluster analysis to analyze why certain classes of homologs are more difficult to classify than others. The LabelHash algorithm is implemented on a web server at http://kavrakilab.org/labelhash/.

摘要

已知结构但功能未知的蛋白质数量日益增加。确定它们的功能将对理解疾病和设计新疗法产生重大影响。然而,通过实验确定蛋白质功能既昂贵又非常耗时。计算方法可以通过识别具有高度结构和化学相似性的蛋白质来促进功能的确定。我们关注的是确定结合位点相似性的方法。尽管存在几种这样的方法,但在大型数据集中快速找到所有与结构基序功能相关的高特异性匹配项仍然是一个具有挑战性的问题。在这种情况下,结构基序是一组带有物理化学信息的三维点,这些信息表征了一种分子功能。我们提出了一种名为LabelHash的新方法,该方法为一组目标创建残基n元组的哈希表。使用这些哈希表,我们可以快速查找与基序的部分匹配项,并将这些匹配项扩展为完全匹配项。我们表明,通过仅应用非常温和的几何约束,我们可以在非常大的数据集中以及对于非常一般的结构基序找到具有极高特异性的统计显著匹配项。我们证明,当采用简单的几何过滤器时,我们的方法需要合理的存储量,并且在保持非常高灵敏度的同时进一步提高了我们先前工作的特异性。我们的算法在20个同源物类别和蛋白质数据库的非冗余版本上进行评估,作为我们的背景数据集。我们使用聚类分析来分析为什么某些类别的同源物比其他同源物更难分类。LabelHash算法在http://kavrakilab.org/labelhash/的网络服务器上实现。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验