Biomedical Data Science Lab, Instituto Universitario de Tecnologías de la Información y Comunicaciones, Universitat Politècnica de València, Valencia, Spain.
Centro de Investigación Científica y de Educación Superior de Ensenada, Ensenada, Mexico.
JMIR Public Health Surveill. 2022 Mar 30;8(3):e30032. doi: 10.2196/30032.
The COVID-19 pandemic has led to an unprecedented global health care challenge for both medical institutions and researchers. Recognizing different COVID-19 subphenotypes-the division of populations of patients into more meaningful subgroups driven by clinical features-and their severity characterization may assist clinicians during the clinical course, the vaccination process, research efforts, the surveillance system, and the allocation of limited resources.
We aimed to discover age-sex unbiased COVID-19 patient subphenotypes based on easily available phenotypical data before admission, such as pre-existing comorbidities, lifestyle habits, and demographic features, to study the potential early severity stratification capabilities of the discovered subgroups through characterizing their severity patterns, including prognostic, intensive care unit (ICU), and morbimortality outcomes.
We used the Mexican Government COVID-19 open data, including 778,692 SARS-CoV-2 population-based patient-level data as of September 2020. We applied a meta-clustering technique that consists of a 2-stage clustering approach combining dimensionality reduction (ie, principal components analysis and multiple correspondence analysis) and hierarchical clustering using the Ward minimum variance method with Euclidean squared distance.
In the independent age-sex clustering analyses, 56 clusters supported 11 clinically distinguishable meta-clusters (MCs). MCs 1-3 showed high recovery rates (90.27%-95.22%), including healthy patients of all ages, children with comorbidities and priority in receiving medical resources (ie, higher rates of hospitalization, intubation, and ICU admission) compared with other adult subgroups that have similar conditions, and young obese smokers. MCs 4-5 showed moderate recovery rates (81.30%-82.81%), including patients with hypertension or diabetes of all ages and obese patients with pneumonia, hypertension, and diabetes. MCs 6-11 showed low recovery rates (53.96%-66.94%), including immunosuppressed patients with high comorbidity rates, patients with chronic kidney disease with a poor survival length and probability of recovery, older smokers with chronic obstructive pulmonary disease, older adults with severe diabetes and hypertension, and the oldest obese smokers with chronic obstructive pulmonary disease and mild cardiovascular disease. Group outcomes conformed to the recent literature on dedicated age-sex groups. Mexican states and several types of clinical institutions showed relevant heterogeneity regarding severity, potentially linked to socioeconomic or health inequalities.
The proposed 2-stage cluster analysis methodology produced a discriminative characterization of the sample and explainability over age and sex. These results can potentially help in understanding the clinical patient and their stratification for automated early triage before further tests and laboratory results are available and even in locations where additional tests are not available or to help decide resource allocation among vulnerable subgroups such as to prioritize vaccination or treatments.
COVID-19 大流行给医疗机构和研究人员带来了前所未有的全球医疗保健挑战。识别不同的 COVID-19 亚表型——将患者人群划分为更有意义的亚组,这些亚组由临床特征驱动,以及对其严重程度进行特征描述,可能有助于临床医生在临床病程、疫苗接种过程、研究工作、监测系统和有限资源分配中进行管理。
我们旨在基于入院前可获得的表型数据(如既往合并症、生活方式习惯和人口统计学特征)发现年龄-性别无偏的 COVID-19 患者亚表型,通过描述其严重程度模式(包括预后、重症监护病房(ICU)和病死率)来研究发现的亚组的潜在早期严重程度分层能力。
我们使用了墨西哥政府的 COVID-19 公开数据,其中包括截至 2020 年 9 月的 778692 例基于人群的 SARS-CoV-2 患者水平数据。我们应用了一种元聚类技术,该技术由两阶段聚类方法组成,结合了降维(即主成分分析和多元对应分析)和层次聚类,使用 Ward 最小方差法和欧几里得平方距离。
在独立的年龄-性别聚类分析中,56 个聚类支持 11 个具有临床意义的可区分元聚类(MC)。MCs 1-3 显示出较高的康复率(90.27%-95.22%),包括所有年龄段的健康患者、患有合并症且优先获得医疗资源的儿童(即更高的住院、插管和 ICU 入院率),而其他具有相似条件的成年亚组和年轻肥胖吸烟者则不同。MCs 4-5 显示出中等的康复率(81.30%-82.81%),包括所有年龄段的高血压或糖尿病患者以及患有肺炎、高血压和糖尿病的肥胖患者。MCs 6-11 显示出较低的康复率(53.96%-66.94%),包括合并症发生率较高的免疫抑制患者、生存时间和康复概率较差的慢性肾脏病患者、患有慢性阻塞性肺疾病的老年吸烟者、患有严重糖尿病和高血压的老年患者以及患有慢性阻塞性肺疾病和轻度心血管疾病的最年长肥胖吸烟者。各组的结局与专门针对特定年龄和性别的文献一致。墨西哥各州和几种类型的临床机构在严重程度方面存在显著的异质性,这可能与社会经济或健康不平等有关。
所提出的两阶段聚类分析方法对样本进行了有区别的特征描述,并对年龄和性别进行了可解释性分析。这些结果可能有助于理解临床患者及其分层,以便在进一步的测试和实验室结果可用之前进行自动化早期分诊,甚至在没有额外测试的情况下,也有助于决定脆弱亚组之间的资源分配,例如优先接种疫苗或治疗。