• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估用于改进超家族识别的策略。

Assessing strategies for improved superfamily recognition.

作者信息

Sillitoe Ian, Dibley Mark, Bray James, Addou Sarah, Orengo Christine

机构信息

Biomolecular Structure and Modelling Unit, Department of Biochemistry and Molecular Biology, University College London, UK.

出版信息

Protein Sci. 2005 Jul;14(7):1800-10. doi: 10.1110/ps.041056105. Epub 2005 Jun 3.

DOI:10.1110/ps.041056105
PMID:15937274
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2253352/
Abstract

There are more than 200 completed genomes and over 1 million nonredundant sequences in public repositories. Although the structural data are more sparse (approximately 13,000 nonredundant structures solved to date), several powerful sequence-based methodologies now allow these structures to be mapped onto related regions in a significant proportion of genome sequences. We review a number of publicly available strategies for providing structural annotations for genome sequences, and we describe the protocol adopted to provide CATH structural annotations for completed genomes. In particular, we assess the performance of several sequence-based protocols employing Hidden Markov model (HMM) technologies for superfamily recognition, including a new approach (SAMOSA [sequence augmented models of structure alignments]) that exploits multiple structural alignments from the CATH domain structure database when building the models. Using a data set of remote homologs detected by structure comparison and manually validated in CATH, a single-seed HMM library was able to recognize 76% of the data set. Including the SAMOSA models in the HMM library showed little gain in homolog recognition, although a slight improvement in alignment quality was observed for very remote homologs. However, using an expanded 1D-HMM library, CATH-ISL increased the coverage to 86%. The single-seed HMM library has been used to annotate the protein sequences of 120 genomes from all three major kingdoms, allowing up to 70% of the genes or partial genes to be assigned to CATH superfamilies. It has also been used to recruit sequences from Swiss-Prot and TrEMBL into CATH domain superfamilies, expanding the CATH database eightfold.

摘要

公共数据库中有200多个已完成的基因组和超过100万个非冗余序列。尽管结构数据更为稀少(迄今已解析出约13000个非冗余结构),但现在有几种强大的基于序列的方法可将这些结构映射到相当一部分基因组序列的相关区域。我们综述了一些为基因组序列提供结构注释的公开可用策略,并描述了为已完成基因组提供CATH结构注释所采用的方案。特别是,我们评估了几种采用隐马尔可夫模型(HMM)技术进行超家族识别的基于序列的方案的性能,包括一种新方法(SAMOSA [结构比对的序列增强模型]),该方法在构建模型时利用了来自CATH结构域数据库的多个结构比对。使用通过结构比较检测并在CATH中手动验证的远程同源物数据集,单种子HMM库能够识别该数据集的76%。将SAMOSA模型纳入HMM库在同源物识别方面几乎没有提高,尽管对于非常远程的同源物,比对质量略有改善。然而,使用扩展的一维HMM库,CATH-ISL将覆盖率提高到了86%。单种子HMM库已用于注释来自所有三个主要生物界的120个基因组的蛋白质序列,使多达70%的基因或部分基因能够被指定到CATH超家族。它还被用于将来自Swiss-Prot和TrEMBL的序列招募到CATH结构域超家族中,使CATH数据库扩大了八倍。

相似文献

1
Assessing strategies for improved superfamily recognition.评估用于改进超家族识别的策略。
Protein Sci. 2005 Jul;14(7):1800-10. doi: 10.1110/ps.041056105. Epub 2005 Jun 3.
2
The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis.CATH结构域数据库以及相关资源Gene3D和DHS为基因组分析提供了全面的结构域家族信息。
Nucleic Acids Res. 2005 Jan 1;33(Database issue):D247-51. doi: 10.1093/nar/gki024.
3
Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure.使用代表所有已知结构蛋白质的隐马尔可夫模型库将同源性分配给基因组序列。
J Mol Biol. 2001 Nov 2;313(4):903-19. doi: 10.1006/jmbi.2001.5080.
4
PASS2: an automated database of protein alignments organised as structural superfamilies.PASS2:一个以结构超家族形式组织的蛋白质比对自动化数据库。
BMC Bioinformatics. 2004 Apr 2;5:35. doi: 10.1186/1471-2105-5-35.
5
The SUPERFAMILY database in 2004: additions and improvements.2004年的SUPERFAMILY数据库:新增内容与改进
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D235-9. doi: 10.1093/nar/gkh117.
6
On single and multiple models of protein families for the detection of remote sequence relationships.用于检测远缘序列关系的蛋白质家族单模型和多模型研究
BMC Bioinformatics. 2006 Jan 31;7:48. doi: 10.1186/1471-2105-7-48.
7
Structural diversity of domain superfamilies in the CATH database.CATH数据库中结构域超家族的结构多样性。
J Mol Biol. 2006 Jul 14;360(3):725-41. doi: 10.1016/j.jmb.2006.05.035. Epub 2006 Jun 2.
8
Identification and distribution of protein families in 120 completed genomes using Gene3D.利用Gene3D在120个已完成测序的基因组中鉴定蛋白质家族并分析其分布情况。
Proteins. 2005 May 15;59(3):603-15. doi: 10.1002/prot.20409.
9
Filling-in void and sparse regions in protein sequence space by protein-like artificial sequences enables remarkable enhancement in remote homology detection capability.通过类似蛋白质的人工序列填补蛋白质序列空间中的空白和稀疏区域,可以显著提高远程同源检测能力。
J Mol Biol. 2014 Feb 20;426(4):962-79. doi: 10.1016/j.jmb.2013.11.026. Epub 2013 Dec 4.
10
Progress of structural genomics initiatives: an analysis of solved target structures.结构基因组学计划的进展:已解析目标结构的分析
J Mol Biol. 2005 May 20;348(5):1235-60. doi: 10.1016/j.jmb.2005.03.037. Epub 2005 Apr 2.

引用本文的文献

1
From local structure to a global framework: recognition of protein folds.从局部结构到全局框架:蛋白质折叠的识别
J R Soc Interface. 2014 Apr 16;11(95):20131147. doi: 10.1098/rsif.2013.1147. Print 2014 Jun 6.
2
Computational approaches for rational design of proteins with novel functionalities.用于合理设计具有新功能蛋白质的计算方法。
Comput Struct Biotechnol J. 2012 Sep 28;2:e201209002. doi: 10.5936/csbj.201209002. eCollection 2012.
3
New functional families (FunFams) in CATH to improve the mapping of conserved functional sites to 3D structures.CATH 中的新功能家族(FunFams),以改进将保守功能位点映射到 3D 结构的工作。
Nucleic Acids Res. 2013 Jan;41(Database issue):D490-8. doi: 10.1093/nar/gks1211. Epub 2012 Nov 29.
4
Computational protein design: validation and possible relevance as a tool for homology searching and fold recognition.计算蛋白质设计:验证及其作为同源搜索和折叠识别工具的可能相关性。
PLoS One. 2010 May 5;5(5):e10410. doi: 10.1371/journal.pone.0010410.
5
Gene3D: merging structure and function for a Thousand genomes.Gene3D:整合结构与功能的千基因组。
Nucleic Acids Res. 2010 Jan;38(Database issue):D296-300. doi: 10.1093/nar/gkp987. Epub 2009 Nov 11.
6
The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution.CATH结构域结构数据库:新协议和分类级别为探索进化提供了更全面的资源。
Nucleic Acids Res. 2007 Jan;35(Database issue):D291-7. doi: 10.1093/nar/gkl959. Epub 2006 Nov 29.
7
Protein superfamily evolution and the last universal common ancestor (LUCA).蛋白质超家族进化与最后共同祖先(LUCA)。
J Mol Evol. 2006 Oct;63(4):513-25. doi: 10.1007/s00239-005-0289-7. Epub 2006 Oct 4.
8
Application of protein structure alignments to iterated hidden Markov model protocols for structure prediction.蛋白质结构比对在用于结构预测的迭代隐马尔可夫模型协议中的应用。
BMC Bioinformatics. 2006 Sep 14;7:410. doi: 10.1186/1471-2105-7-410.
9
Identification of similar regions of protein structures using integrated sequence and structure analysis tools.使用综合序列和结构分析工具鉴定蛋白质结构的相似区域。
BMC Struct Biol. 2006 Mar 9;6:4. doi: 10.1186/1472-6807-6-4.
10
Exploiting protein structure data to explore the evolution of protein function and biological complexity.利用蛋白质结构数据探索蛋白质功能的演变和生物复杂性。
Philos Trans R Soc Lond B Biol Sci. 2006 Mar 29;361(1467):425-40. doi: 10.1098/rstb.2005.1801.

本文引用的文献

1
Evolution of protein superfamilies and bacterial genome size.蛋白质超家族的进化与细菌基因组大小
J Mol Biol. 2004 Feb 27;336(4):871-87. doi: 10.1016/j.jmb.2003.12.044.
2
The SUPERFAMILY database in 2004: additions and improvements.2004年的SUPERFAMILY数据库:新增内容与改进
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D235-9. doi: 10.1093/nar/gkh117.
3
The Pfam protein families database.Pfam蛋白质家族数据库。
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D138-41. doi: 10.1093/nar/gkh121.
4
GenBank: update.基因库:更新。
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6. doi: 10.1093/nar/gkh045.
5
Recognizing the fold of a protein structure.识别蛋白质结构的折叠。
Bioinformatics. 2003 Sep 22;19(14):1748-59. doi: 10.1093/bioinformatics/btg240.
6
A structural perspective on genome evolution.基因组进化的结构视角。
Curr Opin Struct Biol. 2003 Jun;13(3):359-69. doi: 10.1016/s0959-440x(03)00079-4.
7
Gene3D: structural assignments for the biologist and bioinformaticist alike.基因3D:为生物学家和生物信息学家提供的结构分类
Nucleic Acids Res. 2003 Jan 1;31(1):469-73. doi: 10.1093/nar/gkg051.
8
A comparison of profile hidden Markov model procedures for remote homology detection.用于远程同源性检测的轮廓隐马尔可夫模型程序比较。
Nucleic Acids Res. 2002 Oct 1;30(19):4321-8. doi: 10.1093/nar/gkf544.
9
The use of structure information to increase alignment accuracy does not aid homologue detection with profile HMMs.使用结构信息来提高比对准确性并不能帮助使用轮廓隐马尔可夫模型进行同源物检测。
Bioinformatics. 2002 Sep;18(9):1243-9. doi: 10.1093/bioinformatics/18.9.1243.
10
Getting the most from PSI-BLAST.充分利用PSI-BLAST。
Trends Biochem Sci. 2002 Mar;27(3):161-4. doi: 10.1016/s0968-0004(01)02039-4.