Department of Diagnostic Radiology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong SAR, China.
Faculty of Science, The University of Hong Kong, Pokfulam, Hong Kong SAR, China.
BMC Infect Dis. 2024 Aug 9;24(1):803. doi: 10.1186/s12879-024-09699-x.
Predicting an individual's risk of death from COVID-19 is essential for planning and optimising resources. However, since the real-world mortality rate is relatively low, particularly in places like Hong Kong, this makes building an accurate prediction model difficult due to the imbalanced nature of the dataset. This study introduces an innovative application of graph convolutional networks (GCNs) to predict COVID-19 patient survival using a highly imbalanced dataset. Unlike traditional models, GCNs leverage structural relationships within the data, enhancing predictive accuracy and robustness. By integrating demographic and laboratory data into a GCN framework, our approach addresses class imbalance and demonstrates significant improvements in prediction accuracy.
The cohort included all consecutive positive COVID-19 patients fulfilling study criteria admitted to 42 public hospitals in Hong Kong between January 23 and December 31, 2020 (n = 7,606). We proposed the population-based graph convolutional neural network (GCN) model which took blood test results, age and sex as inputs to predict the survival outcomes. Furthermore, we compared our proposed model to the Cox Proportional Hazard (CPH) model, conventional machine learning models, and oversampling machine learning models. Additionally, a subgroup analysis was performed on the test set in order to acquire a deeper understanding of the relationship between each patient node and its neighbours, revealing possible underlying causes of the inaccurate predictions.
The GCN model was the top-performing model, with an AUC of 0.944, considerably outperforming all other models (p < 0.05), including the oversampled CPH model (0.708), linear regression (0.877), Linear Discriminant Analysis (0.860), K-nearest neighbours (0.834), Gaussian predictor (0.745) and support vector machine (0.847). With Kaplan-Meier estimates, the GCN model demonstrated good discriminability between low- and high-risk individuals (p < 0.0001). Based on subanalysis using the weighted-in score, although the GCN model was able to discriminate well between different predicted groups, the separation was inadequate between false negative (FN) and true negative (TN) groups.
The GCN model considerably outperformed all other machine learning methods and baseline CPH models. Thus, when applied to this imbalanced COVID survival dataset, adopting a population graph representation may be an approach to achieving good prediction.
预测个体 COVID-19 死亡风险对于规划和优化资源至关重要。然而,由于现实世界中的死亡率相对较低,尤其是在香港等地,这使得构建准确的预测模型变得困难,因为数据集存在不平衡的性质。本研究介绍了一种使用高度不平衡数据集预测 COVID-19 患者生存的图卷积网络(GCN)的创新应用。与传统模型不同,GCN 利用数据中的结构关系,提高预测准确性和鲁棒性。通过将人口统计学和实验室数据集成到 GCN 框架中,我们的方法解决了类不平衡问题,并在预测准确性方面取得了显著提高。
该队列包括 2020 年 1 月 23 日至 12 月 31 日期间在香港 42 家公立医院连续收治的所有符合研究标准的 COVID-19 阳性患者(n=7606)。我们提出了基于人群的图卷积神经网络(GCN)模型,该模型将血液检测结果、年龄和性别作为输入,以预测生存结果。此外,我们将我们提出的模型与 Cox 比例风险(CPH)模型、传统机器学习模型和过采样机器学习模型进行了比较。此外,还对测试集进行了亚组分析,以便更深入地了解每个患者节点与其邻居之间的关系,揭示预测不准确的可能潜在原因。
GCN 模型是表现最佳的模型,AUC 为 0.944,明显优于所有其他模型(p<0.05),包括过采样的 CPH 模型(0.708)、线性回归(0.877)、线性判别分析(0.860)、K 最近邻(0.834)、高斯预测器(0.745)和支持向量机(0.847)。通过 Kaplan-Meier 估计,GCN 模型在低风险和高风险个体之间表现出良好的区分能力(p<0.0001)。基于使用加权分数的子分析,尽管 GCN 模型能够很好地区分不同的预测组,但在假阴性(FN)和真阴性(TN)组之间的分离不足。
GCN 模型明显优于所有其他机器学习方法和基线 CPH 模型。因此,当应用于这个不平衡的 COVID 生存数据集时,采用人群图表示可能是实现良好预测的一种方法。