Anderson Wesley, Gould Ruth, Patil Namrata, Mohr Nicholas, Dodd Kenneth, Boyce Danielle, Dasher Pam, Guerin Philippe J, Khan Reham, Cheruku Sreekanth, Kumar Vishakha K, Mathé Ewy, Mehta Aneesh K, Michelson Andrew P, Williams Andrew, Heavner Smith F, Podichetty Jagdeep T
Critical Path Institute, Tucson, AZ, United States.
Centers of Disease Control and Prevention, Atlanta, GA, United States.
Front Public Health. 2025 May 15;13:1544904. doi: 10.3389/fpubh.2025.1544904. eCollection 2025.
Disease presentation and progression can vary greatly in heterogeneous diseases, such as COVID-19, with variability in patient outcomes, even within the hospital setting. This variability underscores the need for tailored treatment approaches based on distinct clinical subgroups.
This study aimed to identify COVID-19 patient subgroups with unique clinical characteristics using real-world data (RWD) from electronic health records (EHRs) to inform individualized treatment plans.
A Factor Analysis of Mixed Data (FAMD)-based agglomerative hierarchical clustering approach was employed to analyze the real-world data, enabling the identification of distinct patient subgroups. Statistical tests evaluated cluster differences, and machine learning models classified the identified subgroups.
Three clusters of COVID-19 in patients with unique clinical characteristics were identified. The analysis revealed significant differences in hospital stay durations and survival rates among the clusters, with more severe clinical features correlating with worse prognoses and machine learning classifiers achieving high accuracy in subgroup identification.
By leveraging RWD and advanced clustering techniques, the study provides insights into the heterogeneity of COVID-19 presentations. The findings support the development of classification models that can inform more individualized and effective treatment plans, improving patient outcomes in the future.
在诸如新冠肺炎等异质性疾病中,疾病表现和进展可能有很大差异,即使在医院环境中,患者的预后也存在差异。这种变异性凸显了基于不同临床亚组制定个性化治疗方法的必要性。
本研究旨在利用电子健康记录(EHR)中的真实世界数据(RWD)识别具有独特临床特征的新冠肺炎患者亚组,以为个性化治疗计划提供参考。
采用基于混合数据因子分析(FAMD)的凝聚层次聚类方法分析真实世界数据,从而识别不同的患者亚组。统计检验评估聚类差异,机器学习模型对识别出的亚组进行分类。
识别出了具有独特临床特征的新冠肺炎患者的三个聚类。分析显示各聚类之间在住院时长和生存率方面存在显著差异,临床特征越严重,预后越差,且机器学习分类器在亚组识别中具有较高的准确性。
通过利用真实世界数据和先进的聚类技术,本研究深入了解了新冠肺炎表现的异质性。研究结果支持开发能够为更个性化、有效的治疗计划提供参考的分类模型,从而在未来改善患者预后。