Finn Robert D, Coggill Penelope, Eberhardt Ruth Y, Eddy Sean R, Mistry Jaina, Mitchell Alex L, Potter Simon C, Punta Marco, Qureshi Matloob, Sangrador-Vegas Amaia, Salazar Gustavo A, Tate John, Bateman Alex
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
Nucleic Acids Res. 2016 Jan 4;44(D1):D279-85. doi: 10.1093/nar/gkv1344. Epub 2015 Dec 15.
In the last two years the Pfam database (http://pfam.xfam.org) has undergone a substantial reorganisation to reduce the effort involved in making a release, thereby permitting more frequent releases. Arguably the most significant of these changes is that Pfam is now primarily based on the UniProtKB reference proteomes, with the counts of matched sequences and species reported on the website restricted to this smaller set. Building families on reference proteomes sequences brings greater stability, which decreases the amount of manual curation required to maintain them. It also reduces the number of sequences displayed on the website, whilst still providing access to many important model organisms. Matches to the full UniProtKB database are, however, still available and Pfam annotations for individual UniProtKB sequences can still be retrieved. Some Pfam entries (1.6%) which have no matches to reference proteomes remain; we are working with UniProt to see if sequences from them can be incorporated into reference proteomes. Pfam-B, the automatically-generated supplement to Pfam, has been removed. The current release (Pfam 29.0) includes 16 295 entries and 559 clans. The facility to view the relationship between families within a clan has been improved by the introduction of a new tool.
在过去两年中,Pfam数据库(http://pfam.xfam.org)经历了重大重组,以减少发布新版本所涉及的工作量,从而能够更频繁地发布。可以说,这些变化中最重要的是,Pfam现在主要基于UniProtKB参考蛋白质组,网站上报告的匹配序列和物种数量仅限于这个较小的集合。基于参考蛋白质组序列构建家族带来了更高的稳定性,这减少了维护这些家族所需的人工编辑量。它还减少了网站上显示的序列数量,同时仍然可以访问许多重要的模式生物。不过,与完整UniProtKB数据库的匹配仍然可用,并且仍然可以检索单个UniProtKB序列的Pfam注释。一些与参考蛋白质组没有匹配的Pfam条目(1.6%)仍然存在;我们正在与UniProt合作,看看能否将其中的序列纳入参考蛋白质组。Pfam的自动生成补充版本Pfam-B已被移除。当前版本(Pfam 29.0)包括16295个条目和559个家族。通过引入一种新工具,查看家族内各家族之间关系的功能得到了改进。