Center for Computational Health, IBM Research, Yorktown Heights, NY, USA.
Computational Genomics, IBM Research, Yorktown Heights, NY, USA.
AMIA Annu Symp Proc. 2022 Feb 21;2021:378-387. eCollection 2021.
To date, there have been 180 million confirmed cases of COVID-19, with more than 3.8 million deaths, reported to WHO worldwide. In this paper we address the problem of understanding the host genome's influence, in concert with clinical variables, on the severity of COVID-19 manifestation in the patient. Leveraging positive-unlabeled machine learning algorithms coupled with RubricOE, a state-of-the-art genomic analysis framework, on UK BioBank data we extract novel insights on the complex interplay. The algorithm is also sensitive enough to detect the changing influence of the emergent B.1.1.7 SARS-CoV-2 (alpha) variant on disease severity, and, changing treatment protocols. The genomic component also implicates biological pathways that can help in understanding the disease etiology. Our work demonstrates that it is possible to build a robust and sensitive model despite significant bias, noise and incompleteness in both clinical and genomic data by a careful interleaving of clinical and genomic methodologies.
截至目前,世界卫生组织(WHO)报告全球已确诊 COVID-19 病例达 1.8 亿例,死亡超过 380 万例。在本文中,我们探讨了理解宿主基因组与临床变量共同作用对患者 COVID-19 严重程度的影响这一问题。我们利用正无标签机器学习算法RubricOE,一种最先进的基因组分析框架,对英国生物银行(UK BioBank)的数据进行分析,提取出了关于这一复杂相互作用的新见解。该算法还具有足够的敏感性,可以检测新兴的 B.1.1.7 SARS-CoV-2(阿尔法)变体对疾病严重程度和治疗方案变化的影响。基因组成分还涉及到可以帮助理解疾病病因的生物学途径。我们的工作表明,尽管临床和基因组数据存在显著偏差、噪声和不完整性,通过仔细交错临床和基因组方法,仍然有可能建立一个稳健和敏感的模型。