European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK.
Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden.
Nucleic Acids Res. 2021 Jan 8;49(D1):D412-D419. doi: 10.1093/nar/gkaa913.
The Pfam database is a widely used resource for classifying protein sequences into families and domains. Since Pfam was last described in this journal, over 350 new families have been added in Pfam 33.1 and numerous improvements have been made to existing entries. To facilitate research on COVID-19, we have revised the Pfam entries that cover the SARS-CoV-2 proteome, and built new entries for regions that were not covered by Pfam. We have reintroduced Pfam-B which provides an automatically generated supplement to Pfam and contains 136 730 novel clusters of sequences that are not yet matched by a Pfam family. The new Pfam-B is based on a clustering by the MMseqs2 software. We have compared all of the regions in the RepeatsDB to those in Pfam and have started to use the results to build and refine Pfam repeat families. Pfam is freely available for browsing and download at http://pfam.xfam.org/.
Pfam 数据库是一个广泛用于将蛋白质序列分类为家族和结构域的资源。自上次在本期刊中描述 Pfam 以来,Pfam33.1 中又添加了 350 多个新家族,并且对现有条目进行了许多改进。为了便于研究 COVID-19,我们修订了涵盖 SARS-CoV-2 蛋白质组的 Pfam 条目,并为 Pfam 未涵盖的区域构建了新条目。我们重新引入了 Pfam-B,它提供了 Pfam 的自动生成补充,包含 136730 个新的序列簇,尚未与 Pfam 家族匹配。新的 Pfam-B 是基于 MMseqs2 软件的聚类。我们已经将 RepeatsDB 中的所有区域与 Pfam 中的区域进行了比较,并开始使用这些结果构建和完善 Pfam 重复家族。Pfam 可在 http://pfam.xfam.org/ 免费浏览和下载。