Department of Biostatistics, School of Public Health, Boston University, 801 Massachusetts Ave, CT 3rd Floor, Boston, MA, 02118, USA.
Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 North Wolfe St E3009, Baltimore, MD, 21205, USA.
BMC Med Res Methodol. 2021 Feb 10;21(1):29. doi: 10.1186/s12874-021-01207-y.
Statistical methods for modeling longitudinal and time-to-event data has received much attention in medical research and is becoming increasingly useful. In clinical studies, such as cancer and AIDS, longitudinal biomarkers are used to monitor disease progression and to predict survival. These longitudinal measures are often missing at failure times and may be prone to measurement errors. More importantly, time-dependent survival models that include the raw longitudinal measurements may lead to biased results. In previous studies these two types of data are frequently analyzed separately where a mixed effects model is used for the longitudinal data and a survival model is applied to the event outcome.
In this paper we compare joint maximum likelihood methods, a two-step approach and a time dependent covariate method that link longitudinal data to survival data with emphasis on using longitudinal measures to predict survival. We apply a Bayesian semi-parametric joint method and maximum likelihood joint method that maximizes the joint likelihood of the time-to-event and longitudinal measures. We also implement the Two-Step approach, which estimates random effects separately, and a classic Time Dependent Covariate Model. We use simulation studies to assess bias, accuracy, and coverage probabilities for the estimates of the link parameter that connects the longitudinal measures to survival times.
Simulation results demonstrate that the Two-Step approach performed best at estimating the link parameter when variability in the longitudinal measure is low but is somewhat biased downwards when the variability is high. Bayesian semi-parametric and maximum likelihood joint methods yield higher link parameter estimates with low and high variability in the longitudinal measure. The Time Dependent Covariate method resulted in consistent underestimation of the link parameter. We illustrate these methods using data from the Framingham Heart Study in which lipid measurements and Myocardial Infarction data were collected over a period of 26 years.
Traditional methods for modeling longitudinal and survival data, such as the time dependent covariate method, that use the observed longitudinal data, tend to provide downwardly biased estimates. The two-step approach and joint models provide better estimates, although a comparison of these methods may depend on the underlying residual variance.
在医学研究中,用于建模纵向和事件时间数据的统计方法受到了广泛关注,并且变得越来越有用。在癌症和艾滋病等临床研究中,纵向生物标志物用于监测疾病进展并预测生存。这些纵向测量值在失效时间通常是缺失的,并且可能容易受到测量误差的影响。更重要的是,包含原始纵向测量值的时变生存模型可能会导致有偏的结果。在以前的研究中,这两种类型的数据经常分别进行分析,其中混合效应模型用于纵向数据,生存模型应用于事件结果。
在本文中,我们比较了联合最大似然方法、两步法和与生存数据相关联的时变协变量方法,重点是使用纵向测量值来预测生存。我们应用了贝叶斯半参数联合方法和最大似然联合方法,这些方法最大化了事件时间和纵向测量值的联合似然。我们还实施了两步法,该方法分别估计随机效应,以及经典的时变协变量模型。我们使用模拟研究来评估连接纵向测量值和生存时间的链接参数的估计值的偏差、准确性和覆盖率概率。
模拟结果表明,当纵向测量值的变异性较低时,两步法在估计链接参数方面表现最佳,但当变异性较高时,它会有些向下偏倚。贝叶斯半参数和最大似然联合方法在纵向测量值具有低和高变异性时产生更高的链接参数估计值。时变协变量方法导致链接参数的一致低估。我们使用Framingham 心脏研究中的数据来说明这些方法,其中在 26 年的时间内收集了脂质测量值和心肌梗死数据。
用于建模纵向和生存数据的传统方法,例如使用观察到的纵向数据的时变协变量方法,往往会提供向下有偏的估计值。两步法和联合模型提供了更好的估计值,尽管这些方法的比较可能取决于潜在的剩余方差。