Aryani Amir, Wang Jingbo, Salvador-Carulla Luis, Woo Jihoon, Cheung Cathy P W, Wu Zhuochen, Yin Hui, Xiao Junhua, Lambert Elisabeth A, Howitt Jason, Davidson Jean M, Yoong Serene, Dixon John B, Climie Rachel E, Salinas-Perez Jose A, Bagheri Nasser, Santiago Celine, Williams Joanne, Wickramasinghe Nilmini, Ng Leo, Zwack Clara C, Lambert Gavin W
Swinburne University of Technology, Melbourne, Australia.
National Computational Infrastructure, The Australian National University, Canberra, Australia.
Sci Data. 2025 Jun 10;12(1):978. doi: 10.1038/s41597-025-04992-z.
Research publications aimed at understanding the various aspects of Coronaviruses, particularly COVID-19, have significantly shaped our knowledge base. While the urgency to monitor COVID-19 in real-time has decreased, the continual influx of new research of monthly articles underscores the importance of systematic review and analysis to deepen our understanding of the pandemic's broad impact. To explore research trends and innovations in this space, we developed a pipeline using natural language processing techniques. This pipeline systematically catalogues and synthesises the vast array of research articles, leading to the creation of a dataset with more than eight hundred thousand articles from July 2002 to May 2024. This paper describes the content of this dataset and provides the necessary information to make this dataset accessible and reusable for future research. Our approach aggregates and organises global research related to Coronaviruses into thematic clusters such as vaccine development, public health strategies, infection mechanisms, mental health issues, and economic consequences. Also, we have leveraged the contribution of health experts to review and revise the dataset.
旨在了解冠状病毒各个方面,尤其是新冠病毒的研究出版物,极大地塑造了我们的知识库。虽然实时监测新冠病毒的紧迫性已有所降低,但每月源源不断的新研究文章凸显了系统综述和分析对于深化我们对这一疫情广泛影响理解的重要性。为了探索该领域的研究趋势和创新,我们利用自然语言处理技术开发了一个流程。这个流程系统地编目和综合了大量研究文章,从而创建了一个包含2002年7月至2024年5月期间超过八十万篇文章的数据集。本文描述了该数据集的内容,并提供必要信息,以便使该数据集可供未来研究使用且可重复利用。我们的方法将与冠状病毒相关的全球研究汇总并组织成疫苗开发、公共卫生策略、感染机制、心理健康问题和经济后果等主题集群。此外,我们还借助了健康专家的贡献来审查和修订该数据集。