Shanghai Information Center for Life Sciences, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.
National Genomics Data Center, CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.
Sci Data. 2022 Mar 30;9(1):121. doi: 10.1038/s41597-022-01237-1.
The outbreak of Coronavirus Disease 2019 (COVID-19) at the end of 2019 turned into a global pandemic. To help analyze the spread and evolution of the virus, we collated and analyzed data related to the viral genome, sequence variations, and locations in temporal and spatial distribution from GISAID. Information from the Wikipedia web page and published research papers were categorized and mined to extract epidemiological data, which was then integrated with the public dataset. Genomic and epidemiological data were matched with public information, and the data quality was verified by manual curation. Finally, an online database centered on virus genomic information and epidemiological data can be freely accessible at https://www.biosino.org/kgcov/ , which is helpful to identify relevant knowledge and devising epidemic prevention and control policies in collaboration with disease control personnel.
2019 年底爆发的 2019 年冠状病毒病(COVID-19)已演变成全球大流行。为了帮助分析病毒的传播和演变,我们从 GISAID 中整理和分析了与病毒基因组、序列变异以及时空分布位置相关的数据。从 Wikipedia 网页和已发表的研究论文中获取的信息进行了分类和挖掘,以提取流行病学数据,然后将其与公共数据集整合。将基因组和流行病学数据与公共信息相匹配,并通过人工编辑验证数据质量。最后,一个以病毒基因组信息和流行病学数据为中心的在线数据库可在 https://www.biosino.org/kgcov/ 免费获取,这有助于与疾病控制人员合作识别相关知识并制定疫情防控政策。