Cui Wanting, Cabrera Manuel, Finkelstein Joseph
Icahn School of Medicine at Mount Sinai, New York, NY, USA.
Columbia University Irving Medical Center, NY, USA.
Stud Health Technol Inform. 2020 Nov 23;275:32-36. doi: 10.3233/SHTI200689.
The goal of this paper was to apply unsupervised machine learning techniques towards the discovery of latent COVID-19 clusters in patients with chronic lower respiratory diseases (CLRD). Patients who underwent testing for SARS-CoV-2 were identified from electronic medical records. The analytical dataset comprised 2,328 CLRD patients of whom 1,029 were tested COVID-19 positive. We used the factor analysis for mixed data method for preprocessing. It performed principle component analysis on numeric values and multiple correspondence analysis on categorical values which helped convert categorical data into numeric. Cluster analysis was an effective means to both distinguish subgroups of CLRD patients with COVID-19 as well as identify patient clusters which were adversely affected by the infection. Age, comorbidity index and race were important factors for cluster separations. Furthermore, diseases of the circulatory system, the nervous system and sense organs, digestive system, genitourinary system, metabolic diseases and immunity disorders were also important criteria in the resulting cluster analyses.
本文的目标是应用无监督机器学习技术,以发现慢性下呼吸道疾病(CLRD)患者中潜在的新冠病毒疾病群。通过电子病历识别出接受过新冠病毒检测的患者。分析数据集包括2328名CLRD患者,其中1029人新冠病毒检测呈阳性。我们使用混合数据的因子分析法进行预处理。该方法对数值进行主成分分析,对分类值进行多重对应分析,这有助于将分类数据转换为数值。聚类分析是区分新冠病毒疾病CLRD患者亚组以及识别受感染负面影响的患者群的有效手段。年龄、合并症指数和种族是聚类分离的重要因素。此外,循环系统疾病、神经系统和感觉器官疾病、消化系统疾病、泌尿生殖系统疾病、代谢性疾病和免疫紊乱也是聚类分析结果中的重要标准。