十年后的Pfam：一万个家族且仍在增加。

Pfam 10 years on: 10,000 families and still growing.

作者信息

Sammut Stephen John, Finn Robert D, Bateman Alex

机构信息

Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton Hall, Hinxton, Cambridgeshire, CB10 1SA, UK.

出版信息

Brief Bioinform. 2008 May;9(3):210-9. doi: 10.1093/bib/bbn010. Epub 2008 Mar 15.

DOI:10.1093/bib/bbn010

PMID:18344544

Abstract

Classifications of proteins into groups of related sequences are in some respects like a periodic table for biology, allowing us to understand the underlying molecular biology of any organism. Pfam is a large collection of protein domains and families. Its scientific goal is to provide a complete and accurate classification of protein families and domains. The next release of the database will contain over 10,000 entries, which leads us to reflect on how far we are from completing this work. Currently Pfam matches 72% of known protein sequences, but for proteins with known structure Pfam matches 95%, which we believe represents the likely upper bound. Based on our analysis a further 28,000 families would be required to achieve this level of coverage for the current sequence database. We also show that as more sequences are added to the sequence databases the fraction of sequences that Pfam matches is reduced, suggesting that continued addition of new families is essential to maintain its relevance.

摘要

将蛋白质分类为相关序列组在某些方面类似于生物学的元素周期表，使我们能够理解任何生物体的潜在分子生物学。Pfam是一个庞大的蛋白质结构域和家族集合。其科学目标是提供蛋白质家族和结构域的完整且准确的分类。该数据库的下一个版本将包含超过10000个条目，这促使我们思考距离完成这项工作还有多远。目前，Pfam与72%的已知蛋白质序列匹配，但对于具有已知结构的蛋白质，Pfam的匹配率为95%，我们认为这代表了可能的上限。根据我们的分析，要使当前序列数据库达到这一覆盖水平，还需要另外28000个家族。我们还表明，随着更多序列添加到序列数据库中，Pfam匹配的序列比例会降低，这表明持续添加新家族对于维持其相关性至关重要。

相似文献

Pfam 10 years on: 10,000 families and still growing.十年后的Pfam：一万个家族且仍在增加。

Brief Bioinform. 2008 May;9(3):210-9. doi: 10.1093/bib/bbn010. Epub 2008 Mar 15.

Identifying protein domains with the Pfam database.使用Pfam数据库识别蛋白质结构域。

Curr Protoc Bioinformatics. 2003 May;Chapter 2:Unit 2.5. doi: 10.1002/0471250953.bi0205s01.

The Pfam protein families database: towards a more sustainable future.Pfam蛋白质家族数据库：迈向更可持续的未来。

Nucleic Acids Res. 2016 Jan 4;44(D1):D279-85. doi: 10.1093/nar/gkv1344. Epub 2015 Dec 15.

Pfam: the protein families database.Pfam：蛋白质家族数据库。

Nucleic Acids Res. 2014 Jan;42(Database issue):D222-30. doi: 10.1093/nar/gkt1223. Epub 2013 Nov 27.

Pandit: a database of protein and associated nucleotide domains with inferred trees.潘迪特：一个带有推断树的蛋白质及相关核苷酸结构域数据库。

Bioinformatics. 2003 Aug 12;19(12):1556-63. doi: 10.1093/bioinformatics/btg188.

SUPFAM--a database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: implications for structural genomics and function annotation in genomes.SUPFAM——一个通过比较基于序列和基于结构的家族而得出的潜在蛋白质超家族关系数据库：对结构基因组学和基因组功能注释的意义。

Nucleic Acids Res. 2002 Jan 1;30(1):289-93. doi: 10.1093/nar/30.1.289.

The Pfam protein families database.Pfam蛋白质家族数据库。

Nucleic Acids Res. 2008 Jan;36(Database issue):D281-8. doi: 10.1093/nar/gkm960. Epub 2007 Nov 26.

Assignment of protein sequences to existing domain and family classification systems: Pfam and the PDB.将蛋白质序列分配到现有的域和家族分类系统：Pfam 和 PDB。

Bioinformatics. 2012 Nov 1;28(21):2763-72. doi: 10.1093/bioinformatics/bts533. Epub 2012 Aug 31.

The Pfam protein families database in 2019.2019 年 Pfam 蛋白质家族数据库。

Nucleic Acids Res. 2019 Jan 8;47(D1):D427-D432. doi: 10.1093/nar/gky995.

SUPFAM: a database of sequence superfamilies of protein domains.SUPFAM：一个蛋白质结构域序列超家族数据库。

BMC Bioinformatics. 2004 Mar 15;5:28. doi: 10.1186/1471-2105-5-28.

引用本文的文献

Molecular mechanisms of s-methoprene-induced growth inhibition in Ephestia elutella (Hübner) (Lepidoptera: Pyralidae): insights from transcriptomic analysis.保幼激素类似物诱导地中海粉螟（Ephestia elutella (Hübner)，鳞翅目：螟蛾科）生长抑制的分子机制：转录组分析的见解

J Insect Sci. 2025 May 9;25(3). doi: 10.1093/jisesa/ieaf035.

Computational proteomics analysis of for the identification of antifungal drug targets and validation with commercial fungicides.用于鉴定抗真菌药物靶点并使用商业杀真菌剂进行验证的计算蛋白质组学分析。（你提供的原文似乎不完整，“of”后面缺少具体内容）

Front Plant Sci. 2024 Nov 7;15:1429890. doi: 10.3389/fpls.2024.1429890. eCollection 2024.

Evolution is not Uniform Along Coding Sequences.进化并非沿编码序列均匀发生。

Mol Biol Evol. 2023 Mar 4;40(3). doi: 10.1093/molbev/msad042.

High Molecular Weight Kininogen: A Review of the Structural Literature.高分子量激肽原：结构文献综述。

Int J Mol Sci. 2021 Dec 13;22(24):13370. doi: 10.3390/ijms222413370.

An Educational Bioinformatics Project to Improve Genome Annotation.一个旨在改进基因组注释的教育性生物信息学项目。

Front Microbiol. 2020 Dec 7;11:577497. doi: 10.3389/fmicb.2020.577497. eCollection 2020.

A systems biology approach uncovers a gene co-expression network associated with cell wall degradability in maize.系统生物学方法揭示了与玉米细胞壁可降解性相关的基因共表达网络。

PLoS One. 2019 Dec 31;14(12):e0227011. doi: 10.1371/journal.pone.0227011. eCollection 2019.

Why do eukaryotic proteins contain more intrinsically disordered regions?真核生物蛋白为何含有更多的无序区域？

PLoS Comput Biol. 2019 Jul 22;15(7):e1007186. doi: 10.1371/journal.pcbi.1007186. eCollection 2019 Jul.

Metagenomic Analysis of Zinc Surface-Associated Marine Biofilms.锌表面相关海洋生物膜的宏基因组分析。

Microb Ecol. 2019 Feb;77(2):406-416. doi: 10.1007/s00248-018-01313-3. Epub 2019 Jan 5.

The Pfam protein families database in 2019.2019 年 Pfam 蛋白质家族数据库。

Nucleic Acids Res. 2019 Jan 8;47(D1):D427-D432. doi: 10.1093/nar/gky995.

Identification of Antifungal Targets Based on Computer Modeling.基于计算机建模的抗真菌靶点鉴定

J Fungi (Basel). 2018 Jul 4;4(3):81. doi: 10.3390/jof4030081.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

十年后的Pfam：一万个家族且仍在增加。

Pfam 10 years on: 10,000 families and still growing.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献