Department of Microbiology and Immunology, Georgetown University Medical Center, Georgetown University, Washington, DC, USA.
Center for Global Health Science and Security, Georgetown University Medical Center, Georgetown University, Washington, DC, USA.
mBio. 2022 Apr 26;13(2):e0298521. doi: 10.1128/mbio.02985-21. Epub 2022 Mar 1.
Data that catalogue viral diversity on Earth have been fragmented across sources, disciplines, formats, and various degrees of open sharing, posing challenges for research on macroecology, evolution, and public health. Here, we solve this problem by establishing a dynamically maintained database of vertebrate-virus associations, called The Global Virome in One Network (VIRION). The VIRION database has been assembled through both reconciliation of static data sets and integration of dynamically updated databases. These data sources are all harmonized against one taxonomic backbone, including metadata on host and virus taxonomic validity and higher classification; additional metadata on sampling methodology and evidence strength are also available in a harmonized format. In total, the VIRION database is the largest open-source, open-access database of its kind, with roughly half a million unique records that include 9,521 resolved virus "species" (of which 1,661 are ICTV ratified), 3,692 resolved vertebrate host species, and 23,147 unique interactions between taxonomically valid organisms. Together, these data cover roughly a quarter of mammal diversity, a 10th of bird diversity, and ∼6% of the estimated total diversity of vertebrates, and a much larger proportion of their virome than any previous database. We show how these data can be used to test hypotheses about microbiology, ecology, and evolution and make suggestions for best practices that address the unique mix of evidence that coexists in these data. Animals and their viruses are connected by a sprawling, tangled network of species interactions. Data on the host-virus network are available from several sources, which use different naming conventions and often report metadata in different levels of detail. VIRION is a new database that combines several of these existing data sources, reconciles taxonomy to a single consistent backbone, and reports metadata in a format designed by and for virologists. Researchers can use VIRION to easily answer questions like "Can any fish viruses infect humans?" or "Which bats host coronaviruses?" or to build more advanced predictive models, making it an unprecedented step toward a full inventory of the global virome.
有关地球病毒多样性的数据分散在各个来源、学科、格式和不同程度的开放共享中,这给宏观生态学、进化和公共卫生研究带来了挑战。在这里,我们通过建立一个名为“全球病毒网络中的一个网络(VIRION)”的脊椎动物与病毒关联的动态维护数据库来解决这个问题。VIRION 数据库是通过整合静态数据集和动态更新的数据库而建立的。这些数据源都与一个分类学骨干相协调,包括宿主和病毒分类有效性和更高分类的元数据;采样方法和证据强度的附加元数据也以协调的格式可用。总的来说,VIRION 数据库是同类中最大的开源、开放获取数据库,拥有大约 50 万条独特记录,其中包括 9521 种已解决的病毒“物种”(其中 1661 种是 ICTV 认可的)、3692 种已解决的脊椎动物宿主物种,以及 23147 种在分类学上有效的生物之间的独特相互作用。这些数据共同涵盖了大约四分之一的哺乳动物多样性、十分之一的鸟类多样性和估计的脊椎动物总多样性的 6%,以及比以前任何数据库都更大比例的病毒组。我们展示了如何使用这些数据来检验关于微生物学、生态学和进化的假设,并提出了针对这些数据中存在的独特混合证据的最佳实践建议。动物及其病毒通过一个庞大而纠结的物种相互作用网络连接在一起。有关宿主-病毒网络的数据来自几个来源,这些来源使用不同的命名约定,并且经常以不同的详细程度报告元数据。VIRION 是一个新的数据库,它结合了几个现有数据源,将分类学协调到一个单一的一致骨干,并以专为病毒学家设计和设计的格式报告元数据。研究人员可以使用 VIRION 轻松回答“是否任何鱼类病毒都可以感染人类?”或“哪些蝙蝠宿主冠状病毒?”之类的问题,或构建更高级的预测模型,从而朝着全球病毒组的全面清单迈出了前所未有的一步。