Sonbhadra Sanjay Kumar, Agarwal Sonali, Nagabhushan P
IIIT Allahabad, Prayagraj, U.P. India 211015.
Chaos Solitons Fractals. 2020 Nov;140:110155. doi: 10.1016/j.chaos.2020.110155. Epub 2020 Jul 30.
The novel coronavirus disease 2019 (COVID-19) began as an outbreak from epicentre Wuhan, People's Republic of China in late December 2019, and till June 27, 2020 it caused 9,904,906 infections and 496,866 deaths worldwide. The world health organization (WHO) already declared this disease a pandemic. Researchers from various domains are putting their efforts to curb the spread of coronavirus via means of medical treatment and data analytics. In recent years, several research articles have been published in the field of coronavirus caused diseases like severe acute respiratory syndrome (SARS), middle east respiratory syndrome (MERS) and COVID-19. In the presence of numerous research articles, extracting best-suited articles is time-consuming and manually impractical. The objective of this paper is to extract the activity and trends of coronavirus related research articles using machine learning approaches to help the research community for future exploration concerning COVID-19 prevention and treatment techniques. The COVID-19 open research dataset (CORD-19) is used for experiments, whereas several target-tasks along with explanations are defined for classification, based on domain knowledge. Clustering techniques are used to create the different clusters of available articles, and later the task assignment is performed using parallel one-class support vector machines (OCSVMs). These defined tasks describes the behavior of clusters to accomplish target-class guided mining. Experiments with original and reduced features validate the performance of the approach. It is evident that the -means clustering algorithm, followed by parallel OCSVMs, outperforms other methods for both original and reduced feature space.
2019年新型冠状病毒病(COVID-19)于2019年12月下旬在中国武汉爆发,截至2020年6月27日,全球已造成9904906例感染和496866例死亡。世界卫生组织(WHO)已宣布该病为大流行病。来自各个领域的研究人员正在努力通过医学治疗和数据分析手段遏制冠状病毒的传播。近年来,在冠状病毒引起的疾病领域,如严重急性呼吸综合征(SARS)、中东呼吸综合征(MERS)和COVID-19,已经发表了几篇研究文章。在众多研究文章的情况下,提取最合适的文章既耗时又不切实际。本文的目的是使用机器学习方法提取冠状病毒相关研究文章的活动和趋势,以帮助研究界对COVID-19的预防和治疗技术进行未来探索。使用COVID-19开放研究数据集(CORD-19)进行实验,同时根据领域知识定义了几个带有解释的目标任务用于分类。聚类技术用于创建可用文章的不同聚类,随后使用并行单类支持向量机(OCSVM)进行任务分配。这些定义的任务描述了聚类的行为,以完成目标类引导的挖掘。对原始特征和简化特征进行的实验验证了该方法的性能。很明显,对于原始特征空间和简化特征空间,-均值聚类算法随后是并行OCSVM的方法优于其他方法。