Ramadan Ahmed, Kamel Ahmed, Taha Alaa, El-Shabrawy Abdelhamid, Abdel-Fatah Noura Anwar
Data Science and Medical Information Department, DataClin Contract Research Organization, Egypt.
Department of Biostatistics and Demography, Faculty of Graduate Studies for Statistical Research, Cairo University, Egypt.
Heliyon. 2020 Nov;6(11):e05575. doi: 10.1016/j.heliyon.2020.e05575. Epub 2020 Nov 24.
To understand the impact and volume of coronavirus (COVID-19) crisis, univariate analysis is tedious for describing the datasets reported daily. However, to capture the full picture and be able to compare situations and consequences for different countries, multivariate analytical models are suggested in order to visualize and compare the situation of different countries more accurately and precisely.
We aimed to utilize data analysis tools that display the relative positions of data points in fewer dimensions while keeping the variation of the original data set as much as possible, and cluster countries according to their scores on the formed dimensions.
Principal component analysis (PCA) and Partitioning around medoids (PAM) clustering algorithms were used to analyze data of 56 countries, 82 countries and 91 countries with COVID-19 at three time points, eligible countries included in the analysis are those with total cases of 500 or more with no missing data.
After performing PCA, we generated two scores: Disease Magnitude score that represents total cases, total deaths, total actives cases, and critically ill cases, and Mortality Recovery Ratio score that represents the ratio between total deaths to total recoveries in any given country.
Accurate multivariate analyses can be of great value as they can simplify difficult concepts, explore and communicate findings from health datasets, and support the decision-making process.
为了解冠状病毒(COVID-19)危机的影响和规模,单变量分析对于描述每日报告的数据集来说很繁琐。然而,为了全面了解情况并能够比较不同国家的情况和后果,建议使用多变量分析模型,以便更准确、精确地可视化和比较不同国家的情况。
我们旨在利用数据分析工具,在尽可能保留原始数据集变化的同时,以更少的维度显示数据点的相对位置,并根据各国在形成的维度上的得分对其进行聚类。
使用主成分分析(PCA)和围绕中心点划分(PAM)聚类算法,对三个时间点上有COVID-19疫情的56个国家、82个国家和91个国家的数据进行分析,纳入分析的合格国家是那些总病例数为500或更多且无缺失数据的国家。
进行主成分分析后,我们生成了两个分数:疾病严重程度分数,代表总病例数、总死亡数、总活跃病例数和重症病例数;死亡率恢复率分数,代表任何给定国家的总死亡数与总康复数之比。
准确的多变量分析可能具有很大价值,因为它们可以简化复杂概念,探索和交流健康数据集的结果,并支持决策过程。