Centre for Microbiology and Environmental Systems Science, University of Vienna, 1030 Vienna, Austria.
Doctoral School of Microbiology and Environmental Systems Science, University of Vienna, 1030 Vienna, Austria.
Viruses. 2024 Jul 25;16(8):1191. doi: 10.3390/v16081191.
Computational models of homologous protein groups are essential in sequence bioinformatics. Due to the diversity and rapid evolution of viruses, the grouping of protein sequences from virus genomes is particularly challenging. The low sequence similarities of homologous genes in viruses require specific approaches for sequence- and structure-based clustering. Furthermore, the annotation of virus genomes in public databases is not as consistent and up to date as for many cellular genomes. To tackle these problems, we have developed VOGDB, which is a database of virus orthologous groups. VOGDB is a multi-layer database that progressively groups viral genes into groups connected by increasingly remote similarity. The first layer is based on pair-wise sequence similarities, the second layer is based on the sequence profile alignments, and the third layer uses predicted protein structures to find the most remote similarity. VOGDB groups allow for more sensitive homology searches of novel genes and increase the chance of predicting annotations or inferring phylogeny. VOGD B uses all virus genomes from RefSeq and partially reannotates them. VOGDB is updated with every RefSeq release. The unique feature of VOGDB is the inclusion of both prokaryotic and eukaryotic viruses in the same clustering process, which makes it possible to explore old evolutionary relationships of the two groups. VOGDB is freely available at vogdb.org under the CC BY 4.0 license.
同源蛋白组的计算模型在序列生物信息学中至关重要。由于病毒的多样性和快速进化,对病毒基因组中的蛋白质序列进行分组尤其具有挑战性。病毒同源基因的序列相似度较低,因此需要针对序列和结构聚类采用特定方法。此外,公共数据库中病毒基因组的注释不如许多细胞基因组那样一致和及时。为了解决这些问题,我们开发了 VOGDB,这是一个病毒直系同源物数据库。VOGDB 是一个多层数据库,它将病毒基因逐渐分组为通过越来越远的相似度连接的组。第一层基于两两序列相似度,第二层基于序列轮廓比对,第三层使用预测的蛋白质结构来寻找最遥远的相似度。VOGDB 组允许对新基因进行更敏感的同源搜索,并增加预测注释或推断系统发育的机会。VOGDB 使用 RefSeq 中的所有病毒基因组,并对其进行部分重新注释。VOGDB 随每次 RefSeq 版本更新。VOGDB 的独特之处在于将原核和真核病毒纳入相同的聚类过程中,这使得探索这两个组的古老进化关系成为可能。VOGDB 可在 vogdb.org 上免费获得,许可证为 CC BY 4.0。