• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

作为结构基因组学基础的序列与结构蛋白质结构域家族比较。

A comparison of sequence and structure protein domain families as a basis for structural genomics.

作者信息

Elofsson A, Sonnhammer E L

机构信息

Department of Biochemistry, Stockholm University, 106 91 Stockholm, Sweden.

出版信息

Bioinformatics. 1999 Jun;15(6):480-500. doi: 10.1093/bioinformatics/15.6.480.

DOI:10.1093/bioinformatics/15.6.480
PMID:10383473
Abstract

MOTIVATION

Protein families can be defined based on structure or sequence similarity. We wanted to compare two protein family databases, one based on structural and one on sequence similarity, to investigate to what extent they overlap, the similarity in definition of corresponding families, and to create a list of large protein families with unknown structure as a resource for structural genomics. We also wanted to increase the sensitivity of fold assignment by exploiting protein family HMMs.

RESULTS

We compared Pfam, a protein family database based on sequence similarity, to Scop, which is based on structural similarity. We found that 70% of the Scop families exist in Pfam while 57% of the Pfam families exist in Scop. Most families that occur in both databases correspond well to each other, but in some cases they are different. Such cases highlight situations in which structure and sequence approaches differ significantly. The comparison enabled us to compile a list of the largest families that do not occur in Scop; these are suitable targets for structure prediction and determination, and may be useful to guide projects in structural genomics. It can be noted that 13 out of the 20 largest protein families without a known structure are likely transmembrane proteins. We also exploited Pfam to increase the sensitivity of detecting homologs of proteins with known structure, by comparing query sequences to Pfam HMMs that correspond to Scop families. For SWISSPROT+TREMBL, this yielded an increase in fold assignment from 31% to 42% compared to using FASTA only. This method assigned a structure to 22% of the proteins in Saccharomyces cerevisiae, 24% in Escherichia coli, and 16% in Methanococcus jannaschii.

摘要

动机

蛋白质家族可基于结构或序列相似性来定义。我们希望比较两个蛋白质家族数据库,一个基于结构相似性,另一个基于序列相似性,以研究它们的重叠程度、相应家族定义的相似性,并创建一个具有未知结构的大型蛋白质家族列表,作为结构基因组学的资源。我们还希望通过利用蛋白质家族隐马尔可夫模型(HMM)来提高折叠分配的灵敏度。

结果

我们将基于序列相似性的蛋白质家族数据库Pfam与基于结构相似性的Scop进行了比较。我们发现70%的Scop家族存在于Pfam中,而57%的Pfam家族存在于Scop中。两个数据库中都出现的大多数家族彼此对应良好,但在某些情况下它们有所不同。这些情况突出了结构和序列方法存在显著差异的情形。该比较使我们能够编制一份未出现在Scop中的最大家族列表;这些家族是结构预测和确定的合适目标,可能有助于指导结构基因组学项目。可以注意到,20个最大的无已知结构的蛋白质家族中有13个可能是跨膜蛋白。我们还通过将查询序列与对应于Scop家族的Pfam HMM进行比较,利用Pfam提高检测已知结构蛋白质同源物的灵敏度。对于SWISSPROT + TREMBL,与仅使用FASTA相比,这使得折叠分配从31%提高到了42%。这种方法为酿酒酵母中22%的蛋白质、大肠杆菌中24%的蛋白质以及詹氏甲烷球菌中16%的蛋白质分配了结构。

相似文献

1
A comparison of sequence and structure protein domain families as a basis for structural genomics.作为结构基因组学基础的序列与结构蛋白质结构域家族比较。
Bioinformatics. 1999 Jun;15(6):480-500. doi: 10.1093/bioinformatics/15.6.480.
2
SUPFAM--a database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: implications for structural genomics and function annotation in genomes.SUPFAM——一个通过比较基于序列和基于结构的家族而得出的潜在蛋白质超家族关系数据库:对结构基因组学和基因组功能注释的意义。
Nucleic Acids Res. 2002 Jan 1;30(1):289-93. doi: 10.1093/nar/30.1.289.
3
SUPFAM: a database of sequence superfamilies of protein domains.SUPFAM:一个蛋白质结构域序列超家族数据库。
BMC Bioinformatics. 2004 Mar 15;5:28. doi: 10.1186/1471-2105-5-28.
4
Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores.评估基因组学中的注释转移:通过传统分数和概率分数量化蛋白质序列、结构与功能之间的关系。
J Mol Biol. 2000 Mar 17;297(1):233-49. doi: 10.1006/jmbi.2000.3550.
5
Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins.Pfam 3.1:1313个多重比对和隐马尔可夫模型概况与大多数蛋白质匹配。
Nucleic Acids Res. 1999 Jan 1;27(1):260-2. doi: 10.1093/nar/27.1.260.
6
De-DUFing the DUFs: Deciphering distant evolutionary relationships of Domains of Unknown Function using sensitive homology detection methods.去除未知功能结构域中的冗余:使用灵敏的同源性检测方法解析未知功能结构域的远缘进化关系。
Biol Direct. 2015 Jul 31;10:38. doi: 10.1186/s13062-015-0069-2.
7
A classification of disulfide patterns and its relationship to protein structure and function.二硫键模式的分类及其与蛋白质结构和功能的关系。
Protein Sci. 2004 Aug;13(8):2045-58. doi: 10.1110/ps.04613004.
8
De novo prediction of three-dimensional structures for major protein families.主要蛋白质家族三维结构的从头预测。
J Mol Biol. 2002 Sep 6;322(1):65-78. doi: 10.1016/s0022-2836(02)00698-8.
9
Structural similarity to bridge sequence space: finding new families on the bridges.与桥接序列空间的结构相似性:在桥梁上发现新家族。
Protein Sci. 2005 May;14(5):1305-14. doi: 10.1110/ps.041187405.
10
Expectations from structural genomics.结构基因组学的期望。
Protein Sci. 2000 Jan;9(1):197-200. doi: 10.1110/ps.9.1.197.

引用本文的文献

1
GeneralizedDTA: combining pre-training and multi-task learning to predict drug-target binding affinity for unknown drug discovery.通用 DTA:结合预训练和多任务学习,预测未知药物发现的药物-靶标结合亲和力。
BMC Bioinformatics. 2022 Sep 7;23(1):367. doi: 10.1186/s12859-022-04905-6.
2
Comparative genomic analysis of SET domain family reveals the origin, expansion, and putative function of the arthropod-specific SmydA genes as histone modifiers in insects.SET结构域家族的比较基因组分析揭示了节肢动物特有的SmydA基因作为昆虫组蛋白修饰因子的起源、扩张及推定功能。
Gigascience. 2017 Jun 1;6(6):1-16. doi: 10.1093/gigascience/gix031.
3
Accelerating Information Retrieval from Profile Hidden Markov Model Databases.
加速从轮廓隐马尔可夫模型数据库中检索信息
PLoS One. 2016 Nov 22;11(11):e0166358. doi: 10.1371/journal.pone.0166358. eCollection 2016.
4
The dynamics and evolutionary potential of domain loss and emergence.结构域缺失和产生的动态与进化潜力。
Mol Biol Evol. 2012 Feb;29(2):787-96. doi: 10.1093/molbev/msr250. Epub 2011 Oct 19.
5
Evolution of a domain conserved in microtubule-associated proteins of eukaryotes.真核生物微管相关蛋白中保守结构域的进化
Adv Appl Bioinform Chem. 2008;1:51-69. doi: 10.2147/aabc.s3211. Epub 2008 Sep 23.
6
A comprehensive system for evaluation of remote sequence similarity detection.一种用于评估远程序列相似性检测的综合系统。
BMC Bioinformatics. 2007 Aug 28;8:314. doi: 10.1186/1471-2105-8-314.
7
Towards a comprehensive structural coverage of completed genomes: a structural genomics viewpoint.迈向已完成基因组的全面结构覆盖:结构基因组学视角
BMC Bioinformatics. 2007 Mar 9;8:86. doi: 10.1186/1471-2105-8-86.
8
SNP@Domain: a web resource of single nucleotide polymorphisms (SNPs) within protein domain structures and sequences.SNP@Domain:蛋白质结构域结构和序列中单个核苷酸多态性(SNP)的网络资源。
Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W642-4. doi: 10.1093/nar/gkl323.
9
Comparative mapping of sequence-based and structure-based protein domains.基于序列和基于结构的蛋白质结构域的比较图谱
BMC Bioinformatics. 2005 Mar 25;6:77. doi: 10.1186/1471-2105-6-77.
10
SUPFAM: a database of sequence superfamilies of protein domains.SUPFAM:一个蛋白质结构域序列超家族数据库。
BMC Bioinformatics. 2004 Mar 15;5:28. doi: 10.1186/1471-2105-5-28.