Khan Md Hasinur Rahaman, Hossain Ahmed
Institute of Statistical Research and Training, University of Dhaka, Dhaka, Bangladesh.
Department of Public Health, North South University, Dhaka, Bangladesh.
Front Artif Intell. 2020 Nov 23;3:561801. doi: 10.3389/frai.2020.561801. eCollection 2020.
Coronavirus disease 2019 (COVID-19) has developed into a global pandemic, affecting every nation and territory in the world. Machine learning-based approaches are useful when trying to understand the complexity behind the spread of the disease and how to contain its spread effectively. The unsupervised learning method could be useful to evaluate the shortcomings of health facilities in areas of increased infection as well as what strategies are necessary to prevent disease spread within or outside of the country. To contribute toward the well-being of society, this paper focusses on the implementation of machine learning techniques for identifying common prevailing public health care facilities and concerns related to COVID-19 as well as attitudes to infection prevention strategies held by people from different countries concerning the current pandemic situation. Regression tree, random forest, cluster analysis and principal component machine learning techniques are used to analyze the global COVID-19 data of 133 countries obtained from the Worldometer website as of April 17, 2020. The analysis revealed that there are four major clusters among the countries. Eight countries having the highest cumulative infected cases and deaths, forming the first cluster. Seven countries, United States, Spain, Italy, France, Germany, United Kingdom, and Iran, play a vital role in explaining the 60% variation of the total variations by us of the first component characterized by all variables except for the rate variables. The remaining countries explain only 20% of the variation of the total variation by use of the second component characterized by only rate variables. Most strikingly, the analysis found that the variable number of tests by the country did not play a vital role in the prediction of the cumulative number of confirmed cases.
2019冠状病毒病(COVID-19)已发展成为一场全球大流行,影响着世界上的每一个国家和地区。基于机器学习的方法在试图理解该疾病传播背后的复杂性以及如何有效遏制其传播时很有用。无监督学习方法可用于评估感染率上升地区卫生设施的不足之处,以及预防疾病在国内或国外传播所需的策略。为了对社会福祉做出贡献,本文重点关注机器学习技术的应用,以识别常见的主要公共卫生保健设施、与COVID-19相关的问题,以及不同国家的人们对当前大流行形势下感染预防策略的态度。使用回归树、随机森林、聚类分析和主成分机器学习技术来分析截至2020年4月17日从世界ometers网站获取的133个国家的全球COVID-19数据。分析表明,这些国家可分为四个主要类别。累计感染病例和死亡人数最多的八个国家,形成了第一类。美国、西班牙、意大利、法国、德国、英国和伊朗这七个国家,在解释由除比率变量之外的所有变量所表征的第一成分的总变异的60%变异方面起着至关重要的作用。其余国家仅通过由仅比率变量所表征的第二成分解释总变异的20%变异。最引人注目的是,分析发现各国的检测次数变量在预测确诊病例累计数方面并未发挥至关重要的作用。