Suppr超能文献

使用监督式机器学习识别与重症新型冠状病毒肺炎相关的患者人口统计学、临床和严重急性呼吸综合征冠状病毒2基因组因素:一项回顾性多中心研究

Identification of patient demographic, clinical, and SARS-CoV-2 genomic factors associated with severe COVID-19 using supervised machine learning: a retrospective multicenter study.

作者信息

Nirmalarajah Kuganya, Aftanas Patryk, Barati Shiva, Chien Emily, Crowl Gloria, Faheem Amna, Farooqi Lubna, Jamal Alainna J, Khan Saman, Kotwa Jonathon D, Li Angel X, Mozafarihashjin Mohammad, Nasir Jalees A, Shigayeva Altynay, Yim Winfield, Yip Lily, Zhong Xi Zoe, Katz Kevin, Kozak Robert, McArthur Andrew G, Daneman Nick, Maguire Finlay, McGeer Allison J, Duvvuri Venkata R, Mubareka Samira

机构信息

Sunnybrook Research Institute, Toronto, ON, Canada.

Public Health Ontario, 661 University Avenue, Toronto, ON, Canada.

出版信息

BMC Infect Dis. 2025 Jan 28;25(1):132. doi: 10.1186/s12879-025-10450-3.

Abstract

BACKGROUND

Drivers of COVID-19 severity are multifactorial and include multidimensional and potentially interacting factors encompassing viral determinants and host-related factors (i.e., demographics, pre-existing conditions and/or genetics), thus complicating the prediction of clinical outcomes for different severe acute respiratory syndrome coronavirus (SARS-CoV-2) variants. Although millions of SARS-CoV-2 genomes have been publicly shared in global databases, linkages with detailed clinical data are scarce. Therefore, we aimed to establish a COVID-19 patient dataset with linked clinical and viral genomic data to then examine associations between SARS-CoV-2 genomic signatures and clinical disease phenotypes.

METHODS

A cohort of adult patients with laboratory confirmed SARS-CoV-2 from 11 participating healthcare institutions in the Greater Toronto Area (GTA) were recruited from March 2020 to April 2022. Supervised machine learning (ML) models were developed to predict hospitalization using SARS-CoV-2 lineage-specific genomic signatures, patient demographics, symptoms, and pre-existing comorbidities. The relative importance of these features was then evaluated.

RESULTS

Complete clinical data and viral whole genome level information were obtained from 617 patients, 50.4% of whom were hospitalized. Notably, inpatients were older with a mean age of 66.67 years (SD ± 17.64 years), whereas outpatients had a mean age of 44.89 years (SD ± 16.00 years). SHapley Additive exPlanations (SHAP) analyses revealed that underlying vascular disease, underlying pulmonary disease, and fever were the most significant clinical features associated with hospitalization. In models built on the amino acid sequences of functional regions including spike, nucleocapsid, ORF3a, and ORF8 proteins, variants preceding the emergence of variants of concern (VOCs) or pre-VOC variants, were associated with hospitalization.

CONCLUSIONS

Viral genomic features have limited utility in predicting hospitalization across SARS-CoV-2 diversity. Combining clinical and viral genomic datasets provides perspective on patient specific and virus-related factors that impact COVID-19 disease severity. Overall, clinical features had greater discriminatory power than viral genomic features in predicting hospitalization.

摘要

背景

新冠病毒疾病严重程度的驱动因素是多方面的,包括多维且可能相互作用的因素,涵盖病毒决定因素和宿主相关因素(即人口统计学、既往疾病和/或遗传学),因此预测不同严重急性呼吸综合征冠状病毒(SARS-CoV-2)变体的临床结果变得复杂。尽管全球数据库中已公开共享了数百万个SARS-CoV-2基因组,但与详细临床数据的关联却很稀少。因此,我们旨在建立一个包含临床和病毒基因组数据的新冠病毒疾病患者数据集,进而研究SARS-CoV-2基因组特征与临床疾病表型之间的关联。

方法

从2020年3月至2022年4月,招募了大多伦多地区(GTA)11家参与研究的医疗机构中实验室确诊感染SARS-CoV-2的成年患者队列。利用SARS-CoV-2谱系特异性基因组特征、患者人口统计学、症状和既往合并症,开发了监督机器学习(ML)模型来预测住院情况。然后评估这些特征的相对重要性。

结果

从617名患者中获得了完整的临床数据和病毒全基因组水平信息,其中50.4%的患者住院。值得注意的是,住院患者年龄较大,平均年龄为66.67岁(标准差±17.64岁),而门诊患者的平均年龄为44.89岁(标准差±16.00岁)。SHapley加性解释(SHAP)分析显示,潜在血管疾病、潜在肺部疾病和发热是与住院相关的最显著临床特征。在基于包括刺突蛋白、核衣壳蛋白、ORF3a和ORF8蛋白等功能区域氨基酸序列构建的模型中,在关注变体(VOC)出现之前的变体或VOC前变体与住院相关。

结论

病毒基因组特征在预测SARS-CoV-2多样性中的住院情况方面效用有限。结合临床和病毒基因组数据集可提供关于影响新冠病毒疾病严重程度的患者特异性和病毒相关因素的观点。总体而言,在预测住院情况时,临床特征比病毒基因组特征具有更大的鉴别力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/83d5/11773898/7583a06ff2e2/12879_2025_10450_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验