Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, Jilin, China.
Zhuhai Sub Laboratory, Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, Zhuhai College of Science and Technology, Zhuhai, 519041, Guangdong, China.
Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac194.
Coronavirus disease 2019 (COVID-19) has infected hundreds of millions of people and killed millions of them. As an RNA virus, COVID-19 is more susceptible to variation than other viruses. Many problems involved in this epidemic have made biosafety and biosecurity (hereafter collectively referred to as 'biosafety') a popular and timely topic globally. Biosafety research covers a broad and diverse range of topics, and it is important to quickly identify hotspots and trends in biosafety research through big data analysis. However, the data-driven literature on biosafety research discovery is quite scant. We developed a novel topic model based on latent Dirichlet allocation, affinity propagation clustering and the PageRank algorithm (LDAPR) to extract knowledge from biosafety research publications from 2011 to 2020. Then, we conducted hotspot and trend analysis with LDAPR and carried out further studies, including annual hot topic extraction, a 10-year keyword evolution trend analysis, topic map construction, hot region discovery and fine-grained correlation analysis of interdisciplinary research topic trends. These analyses revealed valuable information that can guide epidemic prevention work: (1) the research enthusiasm over a certain infectious disease not only is related to its epidemic characteristics but also is affected by the progress of research on other diseases, and (2) infectious diseases are not only strongly related to their corresponding microorganisms but also potentially related to other specific microorganisms. The detailed experimental results and our code are available at https://github.com/KEAML-JLU/Biosafety-analysis.
2019 年冠状病毒病(COVID-19)已感染数亿人,并导致数百万人死亡。作为一种 RNA 病毒,COVID-19比其他病毒更容易发生变异。这场大流行涉及许多问题,使生物安全和生物安保(以下统称“生物安全”)成为全球热门且及时的话题。生物安全研究涵盖广泛而多样的主题,通过大数据分析快速识别生物安全研究中的热点和趋势非常重要。然而,基于数据的生物安全研究发现文献却相当稀少。我们开发了一种基于潜在狄利克雷分配、亲和传播聚类和 PageRank 算法(LDAPR)的新型主题模型,从 2011 年至 2020 年的生物安全研究出版物中提取知识。然后,我们使用 LDAPR 进行热点和趋势分析,并进行了进一步的研究,包括每年提取热门话题、十年关键词演化趋势分析、主题图构建、热点区域发现以及跨学科研究主题趋势的细粒度相关分析。这些分析揭示了有价值的信息,可以指导防疫工作:(1)对某种传染病的研究热情不仅与其流行特征有关,还受到其他疾病研究进展的影响;(2)传染病不仅与相应的微生物密切相关,而且还可能与其他特定的微生物有关。详细的实验结果和我们的代码可在 https://github.com/KEAML-JLU/Biosafety-analysis 上获取。