Kim Rachel Seongeun, Levy Karin Eli, Mirdita Milot, Chikhi Rayan, Steinegger Martin
Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea.
School of Biological Sciences, Seoul National University, Seoul, Republic of Korea.
Nucleic Acids Res. 2025 Jan 6;53(D1):D340-D347. doi: 10.1093/nar/gkae1119.
The AlphaFold Protein Structure Database (AFDB) is the largest repository of accurately predicted structures with taxonomic labels. Despite providing predictions for over 214 million UniProt entries, the AFDB does not cover viral sequences, severely limiting their study. To address this, we created the Big Fantastic Virus Database (BFVD), a repository of 351 242 protein structures predicted by applying ColabFold to the viral sequence representatives of the UniRef30 clusters. By utilizing homology searches across two petabases of assembled sequencing data, we improved 36% of these structure predictions beyond ColabFold's initial results. BFVD holds a unique repertoire of protein structures as over 62% of its entries show no or low structural similarity to existing repositories. We demonstrate how a substantial fraction of bacteriophage proteins, which remained unannotated based on their sequences, can be matched with similar structures from BFVD. In that, BFVD is on par with the AFDB, while holding nearly three orders of magnitude fewer structures. BFVD is an important virus-specific expansion to protein structure repositories, offering new opportunities to advance viral research. BFVD can be freely downloaded at bfvd.steineggerlab.workers.dev and queried using Foldseek and UniProt labels at bfvd.foldseek.com.
AlphaFold蛋白质结构数据库(AFDB)是最大的带有分类标签的准确预测结构的储存库。尽管已为超过2.14亿个UniProt条目提供了预测,但AFDB并未涵盖病毒序列,这严重限制了对它们的研究。为了解决这个问题,我们创建了大型奇妙病毒数据库(BFVD),这是一个通过将ColabFold应用于UniRef30簇的病毒序列代表而预测出的351242个蛋白质结构的储存库。通过利用跨两个PB级组装测序数据的同源性搜索,我们将这些结构预测中的36%在ColabFold的初始结果基础上进行了改进。BFVD拥有独特的蛋白质结构库,因为其超过62%的条目与现有储存库没有或只有很低的结构相似性。我们展示了很大一部分基于序列仍未注释的噬菌体蛋白质如何能与BFVD中相似的结构相匹配。就此而言,BFVD与AFDB相当,但其结构数量要少近三个数量级。BFVD是蛋白质结构储存库重要的病毒特异性扩展,为推进病毒研究提供了新机会。BFVD可在bfvd.steineggerlab.workers.dev免费下载,并可在bfvd.foldseek.com使用Foldseek和UniProt标签进行查询。