Department of Medical Biology, The University of Melbourne, Parkville, VIC, 3010, Australia.
Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, 3052, Australia.
F1000Res. 2024 Jul 26;10:830. doi: 10.12688/f1000research.55370.2. eCollection 2021.
COVID-19 caused by SARS-CoV-2 has resulted in a global pandemic with a rapidly developing global health and economic crisis. Variations in the disease have been observed and have been associated with the genomic sequence of either the human host or the pathogen. Worldwide scientists scrambled initially to recruit patient cohorts to try and identify risk factors. A resource that presented itself early on was the UK Biobank (UKBB), which is investigating the respective contributions of genetic predisposition and environmental exposure to the development of disease. To enable COVID-19 studies, UKBB is now receiving COVID-19 test data for their participants every two weeks. In addition, UKBB is delivering more frequent updates of death and hospital inpatient data (including critical care admissions) on the UKBB Data Portal. This frequently changing dataset requires a tool that can rapidly process and analyse up-to-date data. We developed an R package specifically for the UKBB COVID-19 data, which summarises COVID-19 test results, performs association tests between COVID-19 susceptibility/severity and potential risk factors such as age, sex, blood type, comorbidities and generates input files for genome-wide association studies (GWAS). By applying the R package to data released in April 2021, we found that age, body mass index, socioeconomic status and smoking are positively associated with COVID-19 susceptibility, severity, and mortality. Males are at a higher risk of COVID-19 infection than females. People staying in aged care homes have a higher chance of being exposed to SARS-CoV-2. By performing GWAS, we replicated the 3p21.31 genetic finding for COVID-19 susceptibility and severity. The ability to iteratively perform such analyses is highly relevant since the UKBB data is updated frequently. As a caveat, users must arrange their own access to the UKBB data to use the R package.
由 SARS-CoV-2 引起的 COVID-19 已经导致了一场全球性的大流行,全球健康和经济危机迅速发展。已经观察到疾病的变异,并与人类宿主或病原体的基因组序列有关。世界各地的科学家最初争先恐后地招募患者队列,试图确定风险因素。一个早期出现的资源是英国生物银行(UKBB),它正在研究遗传易感性和环境暴露对疾病发展的各自贡献。为了能够进行 COVID-19 研究,UKBB 现在每两周为其参与者提供 COVID-19 检测数据。此外,UKBB 正在 UKBB 数据门户上更频繁地更新死亡和住院患者数据(包括重症监护入院)。这个经常变化的数据集需要一个能够快速处理和分析最新数据的工具。我们专门为 UKBB COVID-19 数据开发了一个 R 包,该包总结了 COVID-19 检测结果,在 COVID-19 易感性/严重程度与潜在风险因素(如年龄、性别、血型、合并症)之间进行关联测试,并生成全基因组关联研究(GWAS)的输入文件。通过将 R 包应用于 2021 年 4 月发布的数据,我们发现年龄、体重指数、社会经济地位和吸烟与 COVID-19 的易感性、严重程度和死亡率呈正相关。男性感染 COVID-19 的风险高于女性。住在养老院的人接触 SARS-CoV-2 的机会更高。通过进行 GWAS,我们复制了 3p21.31 对 COVID-19 易感性和严重程度的遗传发现。由于 UKBB 数据经常更新,因此能够迭代执行此类分析的能力非常重要。需要注意的是,用户必须自行安排访问 UKBB 数据才能使用 R 包。