Su Chang, Zhang Yongkang, Flory James H, Weiner Mark G, Kaushal Rainu, Schenck Edward J, Wang Fei
Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA.
Memorial Sloan-Kettering Cancer Center, New York, NY, USA.
NPJ Digit Med. 2021 Jul 14;4(1):110. doi: 10.1038/s41746-021-00481-w.
The coronavirus disease 2019 (COVID-19) is heterogeneous and our understanding of the biological mechanisms of host response to the viral infection remains limited. Identification of meaningful clinical subphenotypes may benefit pathophysiological study, clinical practice, and clinical trials. Here, our aim was to derive and validate COVID-19 subphenotypes using machine learning and routinely collected clinical data, assess temporal patterns of these subphenotypes during the pandemic course, and examine their interaction with social determinants of health (SDoH). We retrospectively analyzed 14418 COVID-19 patients in five major medical centers in New York City (NYC), between March 1 and June 12, 2020. Using clustering analysis, 4 biologically distinct subphenotypes were derived in the development cohort (N = 8199). Importantly, the identified subphenotypes were highly predictive of clinical outcomes (especially 60-day mortality). Sensitivity analyses in the development cohort, and rederivation and prediction in the internal (N = 3519) and external (N = 3519) validation cohorts confirmed the reproducibility and usability of the subphenotypes. Further analyses showed varying subphenotype prevalence across the peak of the outbreak in NYC. We also found that SDoH specifically influenced mortality outcome in Subphenotype IV, which is associated with older age, worse clinical manifestation, and high comorbidity burden. Our findings may lead to a better understanding of how COVID-19 causes disease in different populations and potentially benefit clinical trial development. The temporal patterns and SDoH implications of the subphenotypes may add insights to health policy to reduce social disparity in the pandemic.
2019冠状病毒病(COVID-19)具有异质性,我们对宿主对病毒感染的生物学反应机制的了解仍然有限。识别有意义的临床亚表型可能有助于病理生理学研究、临床实践和临床试验。在此,我们的目的是使用机器学习和常规收集的临床数据来推导和验证COVID-19亚表型,评估这些亚表型在疫情过程中的时间模式,并研究它们与健康的社会决定因素(SDoH)的相互作用。我们回顾性分析了2020年3月1日至6月12日期间纽约市(NYC)五个主要医疗中心的14418例COVID-19患者。通过聚类分析,在开发队列(N = 8199)中得出了4种生物学上不同的亚表型。重要的是,所识别的亚表型对临床结局(尤其是60天死亡率)具有高度预测性。在开发队列中的敏感性分析以及在内部(N = 3519)和外部(N = 3519)验证队列中的重新推导和预测证实了亚表型的可重复性和可用性。进一步分析显示,NYC疫情高峰期间不同亚表型的患病率有所不同。我们还发现,SDoH特别影响亚表型IV的死亡率结局,该亚表型与年龄较大、临床表现较差和高合并症负担相关。我们的研究结果可能有助于更好地理解COVID-19如何在不同人群中引起疾病,并可能有利于临床试验的开展。亚表型的时间模式和SDoH影响可能为卫生政策提供见解,以减少疫情中的社会差距。