Department of Biomedical Engineering, University of California, Irvine, USA.
Hum Genomics. 2020 Oct 9;14(1):36. doi: 10.1186/s40246-020-00288-y.
The course of COVID-19 varies from asymptomatic to severe in patients. The basis for this range in symptoms is unknown. One possibility is that genetic variation is partly responsible for the highly variable response. We evaluated how well a genetic risk score based on chromosomal-scale length variation and machine learning classification algorithms could predict severity of response to SARS-CoV-2 infection.
We compared 981 patients from the UK Biobank dataset who had a severe reaction to SARS-CoV-2 infection before 27 April 2020 to a similar number of age-matched patients drawn for the general UK Biobank population. For each patient, we built a profile of 88 numbers characterizing the chromosomal-scale length variability of their germ line DNA. Each number represented one quarter of the 22 autosomes. We used the machine learning algorithm XGBoost to build a classifier that could predict whether a person would have a severe reaction to COVID-19 based only on their 88-number classification.
We found that the XGBoost classifier could differentiate between the two classes at a significant level (p = 2 · 10) as measured against a randomized control and (p = 3 · 10) as measured against the expected value of a random guessing algorithm (AUC = 0.5). However, we found that the AUC of the classifier was only 0.51, too low for a clinically useful test.
Genetics play a role in the severity of COVID-19, but we cannot yet develop a useful genetic test to predict severity.
COVID-19 患者的病程从无症状到重症不等。这种症状范围的基础尚不清楚。一种可能性是,遗传变异在一定程度上导致了这种高度可变的反应。我们评估了基于染色体规模长度变异和机器学习分类算法的遗传风险评分,其预测 SARS-CoV-2 感染严重程度的效果如何。
我们比较了英国生物库(UK Biobank)数据集的 981 名在 2020 年 4 月 27 日前对 SARS-CoV-2 感染有严重反应的患者,以及从一般 UK Biobank 人群中抽取的年龄相匹配的相似数量的患者。对于每个患者,我们构建了一个由 88 个数字组成的图谱,这些数字描述了其生殖系 DNA 的染色体规模长度变异性。每个数字代表 22 条常染色体的四分之一。我们使用机器学习算法 XGBoost 构建了一个分类器,可以仅根据患者的 88 位数字分类来预测其对 COVID-19 的严重反应。
我们发现,XGBoost 分类器在与随机对照(p=2·10)和随机猜测算法的期望值(p=3·10)相比,都能在显著水平上区分这两个类别(AUC=0.5)。然而,我们发现分类器的 AUC 仅为 0.51,对于临床有用的测试来说太低了。
遗传因素在 COVID-19 的严重程度中起作用,但我们还不能开发出一种有用的遗传测试来预测严重程度。