Gao Fei, Zeng Donglin, Lin Dan-Yu
Department of Biostatistics, University of Washington, Seattle, Washington, U.S.A.
Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina, U.S.A.
Biometrics. 2018 Dec;74(4):1213-1222. doi: 10.1111/biom.12911. Epub 2018 Jun 5.
Interval-censored data arise when the event time of interest can only be ascertained through periodic examinations. In medical studies, subjects may not complete the examination schedule for reasons related to the event of interest. In this article, we develop a semiparametric approach to adjust for such informative dropout in regression analysis of interval-censored data. Specifically, we propose a broad class of joint models, under which the event time of interest follows a transformation model with a random effect and the dropout time follows a different transformation model but with the same random effect. We consider nonparametric maximum likelihood estimation and develop an EM algorithm that involves simple and stable calculations. We prove that the resulting estimators of the regression parameters are consistent, asymptotically normal, and asymptotically efficient with a covariance matrix that can be consistently estimated through profile likelihood. In addition, we show how to consistently estimate the survival function when dropout represents voluntary withdrawal and the cumulative incidence function when dropout is an unavoidable terminal event. Furthermore, we assess the performance of the proposed numerical and inferential procedures through extensive simulation studies. Finally, we provide an application to data on the incidence of diabetes from a major epidemiological cohort study.
当感兴趣的事件时间只能通过定期检查来确定时,就会出现区间删失数据。在医学研究中,受试者可能由于与感兴趣事件相关的原因而未完成检查计划。在本文中,我们开发了一种半参数方法,用于在区间删失数据的回归分析中调整此类信息性失访。具体而言,我们提出了一类广泛的联合模型,在此模型下,感兴趣的事件时间遵循具有随机效应的变换模型,而失访时间遵循不同的变换模型但具有相同的随机效应。我们考虑非参数最大似然估计,并开发了一种涉及简单且稳定计算的期望最大化(EM)算法。我们证明,回归参数的所得估计量是一致的、渐近正态的,并且具有渐近效率,其协方差矩阵可以通过轮廓似然一致地估计。此外,我们展示了在失访代表自愿退出时如何一致地估计生存函数,以及在失访是不可避免的终端事件时如何估计累积发病率函数。此外,我们通过广泛的模拟研究评估了所提出的数值和推断程序的性能。最后,我们提供了一个应用于来自一项主要流行病学队列研究的糖尿病发病率数据的示例。