Suppr超能文献

基于 ECOD 结构域构建的序列家族数据库。

A sequence family database built on ECOD structural domains.

机构信息

Department of Biophysics and Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX, USA.

Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX, USA.

出版信息

Bioinformatics. 2018 Sep 1;34(17):2997-3003. doi: 10.1093/bioinformatics/bty214.

Abstract

MOTIVATION

The ECOD database classifies protein domains based on their evolutionary relationships, considering both remote and close homology. The family group in ECOD provides classification of domains that are closely related to each other based on sequence similarity. Due to different perspectives on domain definition, direct application of existing sequence domain databases, such as Pfam, to ECOD struggles with several shortcomings.

RESULTS

We created multiple sequence alignments and profiles from ECOD domains with the help of structural information in alignment building and boundary delineation. We validated the alignment quality by scoring structure superposition to demonstrate that they are comparable to curated seed alignments in Pfam. Comparison to Pfam and CDD reveals that 27 and 16% of ECOD families are new, but they are also dominated by small families, likely because of the sampling bias from the PDB database. There are 35 and 48% of families whose boundaries are modified comparing to counterparts in Pfam and CDD, respectively.

AVAILABILITY AND IMPLEMENTATION

The new families are now integrated in the ECOD website. The aggregate HMMER profile library and alignment are available for download on ECOD website (http://prodata.swmed.edu/ecod).

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

ECOD 数据库基于进化关系对蛋白质结构域进行分类,同时考虑远程和近缘同源性。ECOD 中的家族群根据序列相似性对彼此密切相关的结构域进行分类。由于对结构域定义的不同看法,直接应用现有的序列结构域数据库(如 Pfam)到 ECOD 存在几个缺点。

结果

我们在构建比对和边界划定的过程中借助结构信息,从 ECOD 结构域创建了多个序列比对和轮廓。我们通过对结构叠加进行评分来验证比对质量,以证明它们与 Pfam 中精心策划的种子比对相当。与 Pfam 和 CDD 的比较表明,27%和 16%的 ECOD 家族是新的,但它们也主要由小家族主导,这可能是由于 PDB 数据库的采样偏差。与 Pfam 和 CDD 中的对应物相比,分别有 35%和 48%的家族的边界发生了改变。

可用性和实施

新的家族现在已经集成到 ECOD 网站中。可以在 ECOD 网站(http://prodata.swmed.edu/ecod)上下载聚合 HMMER 轮廓库和比对。

补充信息

补充数据可在 Bioinformatics 在线获取。

相似文献

1
A sequence family database built on ECOD structural domains.基于 ECOD 结构域构建的序列家族数据库。
Bioinformatics. 2018 Sep 1;34(17):2997-3003. doi: 10.1093/bioinformatics/bty214.
2
ECOD: new developments in the evolutionary classification of domains.ECOD:结构域进化分类的新进展
Nucleic Acids Res. 2017 Jan 4;45(D1):D296-D302. doi: 10.1093/nar/gkw1137. Epub 2016 Nov 29.
4
ECOD: an evolutionary classification of protein domains.ECOD:蛋白质结构域的进化分类
PLoS Comput Biol. 2014 Dec 4;10(12):e1003926. doi: 10.1371/journal.pcbi.1003926. eCollection 2014 Dec.
6
The Pfam protein families database in 2019.2019 年 Pfam 蛋白质家族数据库。
Nucleic Acids Res. 2019 Jan 8;47(D1):D427-D432. doi: 10.1093/nar/gky995.
8
Manual classification strategies in the ECOD database.ECOD数据库中的手动分类策略。
Proteins. 2015 Jul;83(7):1238-51. doi: 10.1002/prot.24818. Epub 2015 May 8.
10
Classification of domains in predicted structures of the human proteome.人类蛋白质组预测结构中的结构域分类。
Proc Natl Acad Sci U S A. 2023 Mar 21;120(12):e2214069120. doi: 10.1073/pnas.2214069120. Epub 2023 Mar 14.

本文引用的文献

1
20 years of the SMART protein domain annotation resource.SMART 蛋白质结构域注释资源 20 年。
Nucleic Acids Res. 2018 Jan 4;46(D1):D493-D496. doi: 10.1093/nar/gkx922.
2
pHMM-tree: phylogeny of profile hidden Markov models.pHMM树:轮廓隐马尔可夫模型的系统发育
Bioinformatics. 2017 Apr 1;33(7):1093-1095. doi: 10.1093/bioinformatics/btw779.
3
UniProt: the universal protein knowledgebase.通用蛋白质知识库:UniProt
Nucleic Acids Res. 2017 Jan 4;45(D1):D158-D169. doi: 10.1093/nar/gkw1099. Epub 2016 Nov 29.
4
ECOD: new developments in the evolutionary classification of domains.ECOD:结构域进化分类的新进展
Nucleic Acids Res. 2017 Jan 4;45(D1):D296-D302. doi: 10.1093/nar/gkw1137. Epub 2016 Nov 29.
5
MSAViewer: interactive JavaScript visualization of multiple sequence alignments.MSAViewer:多序列比对的交互式JavaScript可视化工具。
Bioinformatics. 2016 Nov 15;32(22):3501-3503. doi: 10.1093/bioinformatics/btw474. Epub 2016 Jul 13.
7
The Pfam protein families database: towards a more sustainable future.Pfam蛋白质家族数据库:迈向更可持续的未来。
Nucleic Acids Res. 2016 Jan 4;44(D1):D279-85. doi: 10.1093/nar/gkv1344. Epub 2015 Dec 15.
8
Manual classification strategies in the ECOD database.ECOD数据库中的手动分类策略。
Proteins. 2015 Jul;83(7):1238-51. doi: 10.1002/prot.24818. Epub 2015 May 8.
9
ECOD: an evolutionary classification of protein domains.ECOD:蛋白质结构域的进化分类
PLoS Comput Biol. 2014 Dec 4;10(12):e1003926. doi: 10.1371/journal.pcbi.1003926. eCollection 2014 Dec.
10
CDD: NCBI's conserved domain database.CDD:美国国家生物技术信息中心的保守结构域数据库。
Nucleic Acids Res. 2015 Jan;43(Database issue):D222-6. doi: 10.1093/nar/gku1221. Epub 2014 Nov 20.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验