Afzal Asif, Ansari Zahid, Alshahrani Saad, Raj Arun K, Saheer Kuruniyan Mohamed, Ahamed Saleel C, Nisar Kottakkaran Sooppy
Department of Mechanical Engineering, P. A. College of Engineering (Affiliated to Visvesvaraya Technological University, Belagavi), Mangaluru, India.
Electrical Engineering Section, University Polytechnic, Aligarh Muslim University, Aligarh, India.
Results Phys. 2021 Oct;29:104639. doi: 10.1016/j.rinp.2021.104639. Epub 2021 Aug 21.
In this work, the partitioning clustering of COVID-19 data using c-Means (cM) and Fuzy c-Means (Fc-M) algorithms is carried out. Based on the data available from January 2020 with respect to location, i.e., longitude and latitude of the globe, the confirmed daily cases, recoveries, and deaths are clustered. In the analysis, the maximum cluster size is treated as a variable and is varied from 5 to 50 in both algorithms to find out an optimum number. The performance and validity indices of the clusters formed are analyzed to assess the quality of clusters. The validity indices to understand all the COVID-19 clusters' quality are analysed based on the Zahid SC (Separation Compaction) index, Xie-Beni Index, Fukuyama-Sugeno Index, Validity function, PC (performance coefficient), and CE (entropy) indexes. The analysis results pointed out that five clusters were identified as a major centroid where the pandemic looks concentrated. Additionally, the observations revealed that mainly the pandemic is distributed easily at any global location, and there are several centroids of COVID-19, which primarily act as epicentres. However, the three main COVID-19 clusters identified are 1) cases with value <50,000, 2) cases with a value between 0.1 million to 2 million, and 3) cases above 2 million. These centroids are located in the US, Brazil, and India, where the rest of the small clusters of the pandemic look oriented. Furthermore, the Fc-M technique seems to provide a much better cluster than the c-M algorithm.
在这项工作中,使用c均值(cM)和模糊c均值(Fc-M)算法对新冠肺炎数据进行了划分聚类。基于2020年1月以来全球各地的经度和纬度数据,对每日确诊病例、康复病例和死亡病例进行聚类。在分析中,将最大聚类大小视为一个变量,在两种算法中均从5变化到50,以找出最优数量。对形成的聚类的性能和有效性指标进行分析,以评估聚类的质量。基于扎希德SC(分离紧致性)指数、谢-贝尼指数、福山-菅野指数、有效性函数、性能系数(PC)和熵(CE)指数,分析用于理解所有新冠肺炎聚类质量的有效性指标。分析结果指出,五个聚类被确定为大流行似乎集中的主要质心。此外,观察结果显示,大流行主要容易在全球任何地点传播,并且有几个新冠肺炎质心,它们主要充当疫情中心。然而,确定的三个主要新冠肺炎聚类是:1)病例数<50000的聚类,2)病例数在10万至200万之间的聚类,以及3)病例数超过200万的聚类。这些质心位于美国、巴西和印度,大流行的其他小聚类似乎都指向这些地方。此外,Fc-M技术似乎比c-M算法提供了更好的聚类。