Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, 16672, Greece.
John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA 02138, USA.
Nucleic Acids Res. 2024 Jan 5;52(D1):D502-D512. doi: 10.1093/nar/gkad800.
The Novel Metagenome Protein Families Database (NMPFamsDB) is a database of metagenome- and metatranscriptome-derived protein families, whose members have no hits to proteins of reference genomes or Pfam domains. Each protein family is accompanied by multiple sequence alignments, Hidden Markov Models, taxonomic information, ecosystem and geolocation metadata, sequence and structure predictions, as well as 3D structure models predicted with AlphaFold2. In its current version, NMPFamsDB hosts over 100 000 protein families, each with at least 100 members. The reported protein families significantly expand (more than double) the number of known protein sequence clusters from reference genomes and reveal new insights into their habitat distribution, origins, functions and taxonomy. We expect NMPFamsDB to be a valuable resource for microbial proteome-wide analyses and for further discovery and characterization of novel functions. NMPFamsDB is publicly available in http://www.nmpfamsdb.org/ or https://bib.fleming.gr/NMPFamsDB.
新型宏基因组蛋白家族数据库(NMPFamsDB)是一个宏基因组和宏转录组衍生的蛋白家族数据库,其成员与参考基因组或 Pfam 结构域的蛋白没有匹配。每个蛋白家族都附有多个序列比对、隐马尔可夫模型、分类学信息、生态系统和地理位置元数据、序列和结构预测,以及使用 AlphaFold2 预测的 3D 结构模型。在当前版本中,NMPFamsDB 拥有超过 100000 个蛋白家族,每个家族至少有 100 个成员。报告的蛋白家族显著扩展了(超过两倍)来自参考基因组的已知蛋白序列簇的数量,并揭示了它们的栖息地分布、起源、功能和分类学的新见解。我们预计 NMPFamsDB 将成为微生物全蛋白组分析以及进一步发现和描述新功能的有价值资源。NMPFamsDB 可在 http://www.nmpfamsdb.org/ 或 https://bib.fleming.gr/NMPFamsDB 上公开获取。