• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

量化结构-功能不确定性:对蛋白质注释起源与局限性的图论探索

Quantifying structure-function uncertainty: a graph theoretical exploration into the origins and limitations of protein annotation.

作者信息

Shakhnovich Boris E, Max Harvey J

机构信息

Bioinformatics Program, Boston University, Boston, MA 02215, USA.

出版信息

J Mol Biol. 2004 Apr 2;337(4):933-49. doi: 10.1016/j.jmb.2004.02.009.

DOI:10.1016/j.jmb.2004.02.009
PMID:15033362
Abstract

Since the advent of investigations into structural genomics, research has focused on correctly identifying domain boundaries, as well as domain similarities and differences in the context of their evolutionary relationships. As the science of structural genomics ramps up adding more and more information into the databanks, questions about the accuracy and completeness of our classification and annotation systems appear on the forefront of this research. A central question of paramount importance is how structural similarity relates to functional similarity. Here, we begin to rigorously and quantitatively answer these questions by first exploring the consensus between the most common protein domain structure annotation databases CATH, SCOP and FSSP. Each of these databases explores the evolutionary relationships between protein domains using a combination of automatic and manual, structural and functional, continuous and discrete similarity measures. In order to examine the issue of consensus thoroughly, we build a generalized graph out of each of these databases and hierarchically cluster these graphs at interval thresholds. We then employ a distance measure to find regions of greatest overlap. Using this procedure we were able not only to enumerate the level of consensus between the different annotation systems, but also to define the graph-theoretical origins behind the annotation schema of class, family and superfamily by observing that the same thresholds that define the best consensus regions between FSSP, SCOP and CATH correspond to distinct, non-random phase-transitions in the structure comparison graph itself. To investigate the correspondence in divergence between structure and function further, we introduce a measure of functional entropy that calculates divergence in function space. First, we use this measure to calculate the general correlation between structural homology and functional proximity. We extend this analysis further by quantitatively calculating the average amount of functional information gained from our understanding of structural distance and the corollary inherent uncertainty that represents the theoretical limit of our ability to infer function from structural similarity. Finally we show how our measure of functional "entropy" translates into a more intuitive concept of functional annotation into similarity EC classes.

摘要

自开展结构基因组学研究以来,研究重点一直是在进化关系背景下正确识别结构域边界以及结构域的异同。随着结构基因组学这门科学不断发展,向数据库中添加越来越多的信息,我们分类和注释系统的准确性和完整性问题成为了这项研究的前沿问题。一个至关重要的核心问题是结构相似性与功能相似性之间的关系。在此,我们首先通过探索最常见的蛋白质结构域结构注释数据库CATH、SCOP和FSSP之间的一致性,开始严格且定量地回答这些问题。这些数据库中的每一个都使用自动与手动、结构与功能、连续与离散相似性度量的组合来探索蛋白质结构域之间的进化关系。为了全面研究一致性问题,我们从这些数据库中的每一个构建一个广义图,并在间隔阈值下对这些图进行层次聚类。然后我们使用距离度量来找到重叠度最高的区域。通过这个过程,我们不仅能够列举不同注释系统之间的一致程度,还能够通过观察到定义FSSP、SCOP和CATH之间最佳一致区域的相同阈值对应于结构比较图本身中不同的、非随机的相变,来定义类、家族和超家族注释模式背后的图论起源。为了进一步研究结构与功能差异之间的对应关系,我们引入了一种功能熵度量,用于计算功能空间中的差异。首先,我们使用这个度量来计算结构同源性与功能接近性之间的一般相关性。我们通过定量计算从对结构距离的理解中获得的功能信息的平均量以及代表从结构相似性推断功能能力理论极限的必然固有不确定性,进一步扩展了这一分析。最后,我们展示了我们的功能“熵”度量如何转化为功能注释到相似性EC类的更直观概念。

相似文献

1
Quantifying structure-function uncertainty: a graph theoretical exploration into the origins and limitations of protein annotation.量化结构-功能不确定性:对蛋白质注释起源与局限性的图论探索
J Mol Biol. 2004 Apr 2;337(4):933-49. doi: 10.1016/j.jmb.2004.02.009.
2
Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores.评估基因组学中的注释转移:通过传统分数和概率分数量化蛋白质序列、结构与功能之间的关系。
J Mol Biol. 2000 Mar 17;297(1):233-49. doi: 10.1006/jmbi.2000.3550.
3
Graph sharpening plus graph integration: a synergy that improves protein functional classification.图谱锐化加图谱整合:一种改善蛋白质功能分类的协同作用。
Bioinformatics. 2007 Dec 1;23(23):3217-24. doi: 10.1093/bioinformatics/btm511. Epub 2007 Oct 31.
4
ELISA: a unified, multidimensional view of the protein domain universe.酶联免疫吸附测定法:蛋白质结构域世界的统一多维视角。
Genome Inform. 2004;15(1):213-20.
5
Improving the precision of the structure-function relationship by considering phylogenetic context.通过考虑系统发育背景来提高结构-功能关系的精度。
PLoS Comput Biol. 2005 Jun;1(1):e9. doi: 10.1371/journal.pcbi.0010009. Epub 2005 Jun 24.
6
Fast model-based protein homology detection without alignment.基于快速模型的无需比对的蛋白质同源性检测。
Bioinformatics. 2007 Jul 15;23(14):1728-36. doi: 10.1093/bioinformatics/btm247. Epub 2007 May 8.
7
Functional annotation by sequence-weighted structure alignments: statistical analysis and case studies from the Protein 3000 structural genomics project in Japan.通过序列加权结构比对进行功能注释:来自日本蛋白质3000结构基因组学项目的统计分析与案例研究
Proteins. 2008 Sep;72(4):1333-51. doi: 10.1002/prot.22015.
8
Detecting similarities among distant homologous proteins by comparison of domain flexibilities.通过比较结构域灵活性来检测远源同源蛋白质之间的相似性。
Protein Eng Des Sel. 2007 Jun;20(6):285-99. doi: 10.1093/protein/gzm021. Epub 2007 Jun 15.
9
Fold independent structural comparisons of protein-ligand binding sites for exploring functional relationships.用于探索功能关系的蛋白质-配体结合位点的折叠独立结构比较。
J Mol Biol. 2006 Feb 3;355(5):1112-24. doi: 10.1016/j.jmb.2005.11.044. Epub 2005 Dec 1.
10
Toward consistent assignment of structural domains in proteins.迈向蛋白质结构域的一致分配
J Mol Biol. 2004 Jun 4;339(3):647-78. doi: 10.1016/j.jmb.2004.03.053.

引用本文的文献

1
Issues in bioinformatics benchmarking: the case study of multiple sequence alignment.生物信息学基准测试中的问题:多序列比对案例研究。
Nucleic Acids Res. 2010 Nov;38(21):7353-63. doi: 10.1093/nar/gkq625. Epub 2010 Jul 17.
2
Evolutionary constraints on structural similarity in orthologs and paralogs.直系同源基因和旁系同源基因结构相似性的进化限制。
Protein Sci. 2009 Jun;18(6):1306-15. doi: 10.1002/pro.143.
3
Defining functional distance using manifold embeddings of gene ontology annotations.利用基因本体注释的流形嵌入定义功能距离。
Proc Natl Acad Sci U S A. 2007 Jul 3;104(27):11334-9. doi: 10.1073/pnas.0702965104. Epub 2007 Jun 26.
4
Origins and impact of constraints in evolution of gene families.基因家族进化中限制因素的起源与影响
Genome Res. 2006 Dec;16(12):1529-36. doi: 10.1101/gr.5346206. Epub 2006 Oct 19.
5
Improving the precision of the structure-function relationship by considering phylogenetic context.通过考虑系统发育背景来提高结构-功能关系的精度。
PLoS Comput Biol. 2005 Jun;1(1):e9. doi: 10.1371/journal.pcbi.0010009. Epub 2005 Jun 24.