Politecnico di Milano, Dipartimento di Elettronica, Informazione e Bioingegneria, 20133, Milano, Italy.
Sci Data. 2022 Jun 1;9(1):260. doi: 10.1038/s41597-022-01348-9.
Since the outbreak of the COVID-19 pandemic, many research organizations have studied the genome of the SARS-CoV-2 virus; a body of public resources have been published for monitoring its evolution. While we experience an unprecedented richness of information in this domain, we also ascertained the presence of several information quality issues. We hereby propose CoV2K, an abstract model for explaining SARS-CoV-2-related concepts and interactions, focusing on viral mutations, their co-occurrence within variants, and their effects. CoV2K provides a clear and concise route map for understanding different connected types of information related to the virus; it thus drives a process of data and knowledge integration that aggregates information from several current resources, harmonizing their content and overcoming incompleteness and inconsistency issues. CoV2K is available for exploration as a graph that can be queried through a RESTful API addressing single entities or paths through their relationships. Practical use cases demonstrate its application to current knowledge inquiries.
自 COVID-19 大流行爆发以来,许多研究机构已经研究了 SARS-CoV-2 病毒的基因组;为了监测其进化,已经发布了大量公共资源。虽然我们在这个领域获得了前所未有的丰富信息,但也确定了存在一些信息质量问题。为此,我们提出了 CoV2K,这是一个用于解释 SARS-CoV-2 相关概念和相互作用的抽象模型,重点关注病毒突变、它们在变体中的共同出现以及它们的影响。CoV2K 为理解与病毒相关的不同类型的连接信息提供了清晰简洁的路线图;因此,它推动了数据和知识的整合过程,整合了来自多个现有资源的信息,协调了它们的内容,并克服了不完整和不一致的问题。CoV2K 可以作为一个图进行探索,通过一个针对单个实体或通过它们的关系的路径的 RESTful API 进行查询。实际用例展示了它在当前知识查询中的应用。