Valdivia-Granda Willy, Larson Francis
Orion Integrated Biosciences Inc., New Rochelle, NY, 10805, USA.
Database (Oxford). 2009;2009:bap014. doi: 10.1093/database/bap014. Epub 2009 Oct 12.
Viruses, viroids and prions are the smallest infectious biological entities that depend on their host for replication. The number of pathogenic viruses is considerably large and their impact in human global health is well documented. Currently, the International Committee on the Taxonomy of Viruses (ICTV) has classified approximately 4379 virus species while the National Center for Biotechnology Information Viral Genomes Resource (NCBI-VGR) database has mapped 617 705 proteins to eight large taxonomic groups. Despite these efforts, an automated approach for mapping the ICTV master list and its officially accepted virus naming to the NCBI-VGR's taxonomical classification is not available. Due to metagenomic sequencing, it is likely that the discovery and naming of new viral species will increase by at least ten fold. Unfortunately, existing viral databases are not adequately prepared to scale, maintain and annotate automatically ultra-high throughput sequences and place this information into specific taxonomic categories. ORION-VIRCAT is a scalable and interoperable object-relational database designed to serve as a resource for the integration and verification of taxonomical classifications generated by the ICTV and NCBI-VGR. The current release (v1.0) of ORION-VIRCAT is implemented in PostgreSQL and it has been extended to ORACLE, MySQL and SyBase. ORION-VIRCAT automatically mapped and joined 617 705 entries from the NCBI-VGR to the viral naming of the ICTV. This detailed analysis revealed that 399 095 entries from the NCBI-VGR can be mapped to the ICTV classification and that one Order, 10 families, 35 genera and 503 species listed in the ICTV disagree with the the NCBI-VGR classification schema. Nevertheless, we were eable to correct several discrepancies mapping 234 000 additional entries.Database URL:http://www.orionbiosciences.com/research/orion-vircat.html.
病毒、类病毒和朊病毒是最小的传染性生物实体,它们依赖宿主进行复制。致病病毒的数量相当庞大,其对全球人类健康的影响有充分记录。目前,国际病毒分类委员会(ICTV)已分类约4379种病毒,而美国国立生物技术信息中心病毒基因组资源(NCBI-VGR)数据库已将617705种蛋白质映射到八个大型分类组。尽管做出了这些努力,但尚无将ICTV主列表及其官方认可的病毒命名映射到NCBI-VGR分类学分类的自动化方法。由于宏基因组测序,新病毒物种的发现和命名可能至少增加十倍。不幸的是,现有的病毒数据库没有做好充分准备来扩展、维护和自动注释超高通量序列,并将这些信息归入特定的分类类别。ORION-VIRCAT是一个可扩展且可互操作的对象关系数据库,旨在作为整合和验证由ICTV和NCBI-VGR生成的分类学分类的资源。ORION-VIRCAT的当前版本(v1.0)是在PostgreSQL中实现的,并已扩展到ORACLE、MySQL和SyBase。ORION-VIRCAT自动将NCBI-VGR的617705个条目映射并连接到ICTV的病毒命名。这一详细分析表明,NCBI-VGR的399095个条目可以映射到ICTV分类,ICTV列出的一个目、10个科、35个属和503个物种与NCBI-VGR分类模式不一致。尽管如此我们能够纠正映射另外234000个条目的几个差异。数据库网址:http://www.orionbiosciences.com/research/orion-vircat.html 。