Seo Hyungwoo, Chung Wonil
Department of Statistics and Actuarial Science, Soongsil University, Seoul, 06978, South Korea.
Program in Genetic Epidemiology and Statistical Genetics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.
Genomics Inform. 2025 Sep 1;23(1):19. doi: 10.1186/s44342-025-00050-7.
The COVID-19 pandemic has highlighted the need for survival models to assess risk factors and time-dependent effects in infectious diseases. However, the Cox proportional hazards (PH) model, which assumes constant covariate effects, struggles to capture disease dynamics. This underscores the need for advanced models that incorporate time-dependent coefficients and covariates for improved accuracy.
To address the need for modeling time-dependent effects and covariates, we applied a stratified Cox PH model with multiple time intervals to better satisfy the PH assumption. We conducted simulations to evaluate the performance of machine learning and deep learning survival models, including random survival forest (RSF), DeepSurv, and DeepHit. To improve time-dependent effect estimation, we introduced a refined time-interval division and a weighted sum approach for integrated hazard ratios of COVID-19 variants. The event of interest was death, and the specific risk compared was the risk of death from the start of the study to either death or the last follow-up among infected versus uninfected individuals.
Our results showed that increasing the number of time intervals improved predictive accuracy. When the PH assumption held, the Cox PH model outperformed machine learning and deep learning models. Applying our approach to UK Biobank data, expanding time intervals from five to fifteen enhanced performance. The previously reported hazard ratio of 7.333 for the pre-Delta period was refined to 29.359 for the Early variant, 20.734 for EU1, and 4.079 for Alpha, revealing a decline in risk across variants.
These findings suggest that refining time intervals improves the understanding of time-dependent effects in infectious diseases. Incorporating stratified intervals and advanced models enhances risk assessment and predictive accuracy for COVID-19 and other evolving diseases.
新冠疫情凸显了生存模型在评估传染病风险因素和时间依赖性效应方面的必要性。然而,假设协变量效应恒定的Cox比例风险(PH)模型难以捕捉疾病动态。这凸显了需要采用纳入时间依赖性系数和协变量的先进模型以提高准确性。
为满足对时间依赖性效应和协变量建模的需求,我们应用了具有多个时间间隔的分层Cox PH模型,以更好地满足PH假设。我们进行了模拟,以评估机器学习和深度学习生存模型的性能,包括随机生存森林(RSF)、DeepSurv和DeepHit。为改进时间依赖性效应估计,我们引入了精细的时间间隔划分和加权求和方法来计算新冠病毒变异株的综合风险比。感兴趣的事件是死亡,比较的具体风险是从研究开始到感染与未感染个体死亡或最后一次随访期间的死亡风险。
我们的结果表明,增加时间间隔的数量可提高预测准确性。当PH假设成立时,Cox PH模型优于机器学习和深度学习模型。将我们的方法应用于英国生物银行数据,将时间间隔从五个扩展到十五个可提高性能。先前报告的德尔塔变异株出现前时期的风险比为7.333,经改进后,早期变异株的风险比为29.359,EU1为20.734,阿尔法变异株为4.079,表明各变异株的风险有所下降。
这些发现表明,细化时间间隔可增进对传染病时间依赖性效应的理解。纳入分层间隔和先进模型可提高对新冠病毒及其他不断演变疾病的风险评估和预测准确性。