Politecnico di Milano.
Brief Bioinform. 2021 Mar 22;22(2):664-675. doi: 10.1093/bib/bbaa359.
With the outbreak of the COVID-19 disease, the research community is producing unprecedented efforts dedicated to better understand and mitigate the effects of the pandemic. In this context, we review the data integration efforts required for accessing and searching genome sequences and metadata of SARS-CoV2, the virus responsible for the COVID-19 disease, which have been deposited into the most important repositories of viral sequences. Organizations that were already present in the virus domain are now dedicating special interest to the emergence of COVID-19 pandemics, by emphasizing specific SARS-CoV2 data and services. At the same time, novel organizations and resources were born in this critical period to serve specifically the purposes of COVID-19 mitigation while setting the research ground for contrasting possible future pandemics. Accessibility and integration of viral sequence data, possibly in conjunction with the human host genotype and clinical data, are paramount to better understand the COVID-19 disease and mitigate its effects. Few examples of host-pathogen integrated datasets exist so far, but we expect them to grow together with the knowledge of COVID-19 disease; once such datasets will be available, useful integrative surveillance mechanisms can be put in place by observing how common variants distribute in time and space, relating them to the phenotypic impact evidenced in the literature.
随着 COVID-19 疾病的爆发,研究界正在做出前所未有的努力,致力于更好地了解和减轻这一流行病的影响。在这种情况下,我们回顾了为访问和搜索 SARS-CoV2(导致 COVID-19 疾病的病毒)基因组序列和元数据而进行的数据集成工作,这些数据已被存入最重要的病毒序列存储库中。已经存在于病毒领域的组织现在特别关注 COVID-19 大流行的出现,强调特定的 SARS-CoV2 数据和服务。与此同时,在这个关键时期,新的组织和资源应运而生,专门为 COVID-19 的缓解服务,同时为对比可能的未来大流行奠定研究基础。病毒序列数据的可及性和集成,可能与人类宿主基因型和临床数据相结合,对于更好地了解 COVID-19 疾病和减轻其影响至关重要。到目前为止,还存在一些宿主-病原体综合数据集的例子,但我们预计随着对 COVID-19 疾病的认识不断增加,这些数据集的数量也会增加;一旦这些数据集可用,就可以通过观察常见变体如何在时间和空间中分布,并将其与文献中证据确凿的表型影响联系起来,建立有用的综合监测机制。