Biostatistics Research Group, Department of Health Sciences, University of Leicester-Centre for Medicine, Leicester, UK.
Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden.
Stat Med. 2019 Oct 15;38(23):4477-4502. doi: 10.1002/sim.8309. Epub 2019 Jul 21.
Survival models incorporating random effects to account for unmeasured heterogeneity are being increasingly used in biostatistical and applied research. Specifically, unmeasured covariates whose lack of inclusion in the model would lead to biased, inefficient results are commonly modeled by including a subject-specific (or cluster-specific) frailty term that follows a given distribution (eg, gamma or lognormal). Despite that, in the context of parametric frailty models, little is known about the impact of misspecifying the baseline hazard or the frailty distribution or both. Therefore, our aim is to quantify the impact of such misspecification in a wide variety of clinically plausible scenarios via Monte Carlo simulation, using open-source software readily available to applied researchers. We generate clustered survival data assuming various baseline hazard functions, including mixture distributions with turning points, and assess the impact of sample size, variance of the frailty, baseline hazard function, and frailty distribution. Models compared include standard parametric distributions and more flexible spline-based approaches; we also included semiparametric Cox models. The resulting bias can be clinically relevant. In conclusion, we highlight the importance of fitting models that are flexible enough and the importance of assessing model fit. We illustrate our conclusions with two applications using data on diabetic retinopathy and bladder cancer. Our results show the importance of assessing model fit with respect to the baseline hazard function and the distribution of the frailty: misspecifying the former leads to biased relative and absolute risk estimates, whereas misspecifying the latter affects absolute risk estimates and measures of heterogeneity.
生存模型中纳入随机效应来解释未测量的异质性,这种方法在生物统计学和应用研究中越来越受欢迎。具体来说,通常通过纳入特定于个体(或特定于聚类)的脆弱性项来对模型中未包含的未测量协变量进行建模,该脆弱性项服从给定的分布(例如,伽马分布或对数正态分布)。尽管如此,在参数脆弱性模型的背景下,对于错误指定基线风险或脆弱性分布或两者的影响知之甚少。因此,我们的目的是通过使用易于应用研究人员使用的开源软件进行蒙特卡罗模拟,在各种临床合理的情况下量化这种错误指定的影响。我们假设各种基线风险函数生成聚类生存数据,包括具有转折点的混合分布,并评估样本量、脆弱性方差、基线风险函数和脆弱性分布的影响。比较的模型包括标准参数分布和更灵活的基于样条的方法;我们还包括半参数 Cox 模型。由此产生的偏差可能具有临床意义。总之,我们强调了拟合足够灵活的模型的重要性,以及评估模型拟合的重要性。我们使用糖尿病视网膜病变和膀胱癌的数据进行了两个应用示例来说明我们的结论。我们的结果表明,评估模型对基线风险函数和脆弱性分布的拟合情况非常重要:错误指定前者会导致相对和绝对风险估计有偏差,而错误指定后者会影响绝对风险估计和异质性度量。