Department of Genetics and Genome Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH 44106, USA.
Clinical Research Unit, Rafik Hariri University Hospital, Beirut 2010, Lebanon.
Viruses. 2022 Jun 28;14(7):1414. doi: 10.3390/v14071414.
The rapid spread of the coronavirus disease COVID-19 has imposed clinical and financial burdens on hospitals and governments attempting to provide patients with medical care and implement disease-controlling policies. The transmissibility of the disease was shown to be correlated with the patient's viral load, which can be measured during testing using the cycle threshold (Ct). Previous models have utilized Ct to forecast the trajectory of the spread, which can provide valuable information to better allocate resources and change policies. However, these models combined other variables specific to medical institutions or came in the form of compartmental models that rely on epidemiological assumptions, all of which could impose prediction uncertainties. In this study, we overcome these limitations using data-driven modeling that utilizes Ct and previous number of cases, two institution-independent variables. We collected three groups of patients (n = 6296, n = 3228, and n = 12,096) from different time periods to train, validate, and independently validate the models. We used three machine learning algorithms and three deep learning algorithms that can model the temporal dynamic behavior of the number of cases. The endpoint was 7-week forward number of cases, and the prediction was evaluated using mean square error (MSE). The sequence-to-sequence model showed the best prediction during validation (MSE = 0.025), while polynomial regression (OLS) and support vector machine regression (SVR) had better performance during independent validation (MSE = 0.1596, and MSE = 0.16754, respectively), which exhibited better generalizability of the latter. The OLS and SVR models were used on a dataset from an external institution and showed promise in predicting COVID-19 incidences across institutions. These models may support clinical and logistic decision-making after prospective validation.
新型冠状病毒病(COVID-19)的迅速传播给试图为患者提供医疗服务和实施疾病控制政策的医院和政府带来了临床和财政负担。疾病的传染性与患者的病毒载量相关,这可以在使用循环阈值(Ct)进行检测时测量。以前的模型利用 Ct 来预测疾病的传播轨迹,这可以提供有价值的信息,以更好地分配资源和改变政策。然而,这些模型结合了其他特定于医疗机构的变量,或者采用了依赖于流行病学假设的房室模型,所有这些都可能带来预测的不确定性。在这项研究中,我们使用了数据驱动的建模方法来克服这些限制,该方法利用了 Ct 和以前的病例数这两个与医疗机构无关的变量。我们收集了来自不同时间段的三组患者(n = 6296、n = 3228 和 n = 12096),用于训练、验证和独立验证模型。我们使用了三种机器学习算法和三种可以模拟病例数量时间动态行为的深度学习算法。端点是 7 周前的病例数,使用均方误差(MSE)评估预测。在验证期间,序列到序列模型显示出最佳的预测效果(MSE = 0.025),而多项式回归(OLS)和支持向量机回归(SVR)在独立验证期间表现更好(MSE = 0.1596 和 MSE = 0.16754),这表明后者的通用性更好。OLS 和 SVR 模型在来自外部机构的数据集上进行了测试,显示出在跨机构预测 COVID-19 发病率方面的潜力。这些模型在经过前瞻性验证后可能支持临床和后勤决策。