Department of Epidemiology and Biostatistics, School of Public Health, Curtin University, Perth, Australia; Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK.
Department of Epidemiology and Biostatistics, School of Public Health, Curtin University, Perth, Australia.
Comput Methods Programs Biomed. 2020 Apr;187:105196. doi: 10.1016/j.cmpb.2019.105196. Epub 2019 Nov 15.
In longitudinal epidemiological studies consisting of a baseline stage and a follow-up stage, observations at the baseline stage may contain a countable proportion of negative responses. The time-to-event outcomes of those observations corresponding to negative responses at baseline can be denoted as zeros, which are excluded from standard survival analysis. Consequently, some important information on these subjects is therefore lost in the analysis. Furthermore, subjects are often clustered within hospitals, communities or health service centers, resulting in correlated observations. The framework of the two-part model has been developed and utilized widely to analyze semi-continuous data or count data with excess zeros, but its application to clustered time-to-event data with clumping at zero remains sparse.
A two-part mixed-effects modeling approach was proposed. A logistic mixed-effects regression model was used in the first part to determine factors associated with the prevalence of the baseline event of interest. Parametric frailty models (including Weibull, exponential, log-logistic and log-normal) were used in the second part to assess associations between exposures and time-to-event outcomes. Correlated random effects were incorporated within the two regression models to accommodate the inherent correlation within each clustering unit and the correlation between the two parts. As an illustrative example, the method was applied to exclusive breastfeeding data from a community-based prospective cohort study in Nepal.
A significantly positive correlation between the baseline prevalence of exclusive breastfeeding and exclusive breastfeeding duration was confirmed (ρ = 0.67, P < 0.001). The correlated two-part model outperformed the independent two-part model (likelihood ratio test statistic = 8.6, df = 1, P = 0.003).
The proposed approach makes full use of all available information at baseline and during the follow-up, compared to the conventional survival analysis. In addition to breastfeeding studies, the method can be applied to other research areas where clustered time-to-event data with clumping at zero arise.
在由基线阶段和随访阶段组成的纵向流行病学研究中,基线阶段的观察可能包含可计数的阴性反应比例。那些对应于基线时阴性反应的观察的事件时间结果可以表示为零,这些结果在标准生存分析中被排除。因此,在分析中丢失了关于这些受试者的一些重要信息。此外,受试者通常在医院、社区或卫生服务中心内聚类,导致观察结果相关。两部分模型框架已被广泛开发和利用,用于分析半连续数据或具有过多零值的计数数据,但将其应用于具有零值聚集的聚类时间事件数据仍然很少。
提出了一种两部分混合效应建模方法。第一部分使用逻辑混合效应回归模型来确定与感兴趣的基线事件发生率相关的因素。第二部分使用参数脆弱性模型(包括 Weibull、指数、对数逻辑和对数正态)来评估暴露与时间事件结果之间的关联。在两个回归模型中纳入相关的随机效应,以适应每个聚类单位内的固有相关性以及两部分之间的相关性。作为一个说明性示例,该方法应用于尼泊尔基于社区的前瞻性队列研究中的纯母乳喂养数据。
确认了纯母乳喂养的基线流行率与纯母乳喂养持续时间之间存在显著正相关(ρ=0.67,P<0.001)。相关的两部分模型优于独立的两部分模型(似然比检验统计量=8.6,df=1,P=0.003)。
与传统的生存分析相比,所提出的方法充分利用了基线和随访期间的所有可用信息。除了母乳喂养研究外,该方法还可应用于其他出现零值聚集的聚类时间事件数据的研究领域。