Carlin J B, Wolfe R, Coffey C, Patton G C
Clinical Epidemiology and Biostatistics Unit, Royal Children's Hospital Research Institute and University of Melbourne, Department of Paediatrics, Royal Children's Hospital, Parkville, Vic 3052, Australia.
Stat Med. 1999 Oct 15;18(19):2655-79. doi: 10.1002/(sici)1097-0258(19991015)18:19<2655::aid-sim202>3.0.co;2-#.
Longitudinal studies are increasingly popular in epidemiology. In this tutorial we provide a detailed review of methods used by us in the analysis of a longitudinal (multiwave or panel) study of adolescent health, focusing on smoking behaviour. This example is explored in detail with the principal aim of providing an introduction to the analysis of longitudinal binary data, at a level suited to statisticians familiar with logistic regression and survival analysis but not necessarily experienced in longitudinal analysis or estimating equation methods. We describe recent advances in statistical methodology that can play a practical role in applications and are available with standard software. Our approach emphasizes the importance of stating clear research questions, and for binary outcomes we suggest these are best organized around the key epidemiological concepts of prevalence and incidence. For prevalence questions, we show how unbiased estimating equations and information-sandwich variance estimates may be used to produce a valid and robust analysis, as long as sample size is reasonably large. We also show how the estimating equation approach readily extends to accommodate adjustments for missing data and complex survey design. A detailed discussion of gender-related differences over time in our smoking outcome is used to emphasize the need for great care in separating longitudinal from cross-sectional information. We show how incidence questions may be addressed using a discrete-time version of the proportional hazards regression model. This approach has the advantages of providing estimates of relative risks, being feasible with standard software, and also allowing robust information-sandwich variance estimates.
纵向研究在流行病学中越来越受欢迎。在本教程中,我们详细回顾了我们在分析一项关于青少年健康的纵向(多波或面板)研究时所使用的方法,重点是吸烟行为。通过详细探讨这个例子,主要目的是为熟悉逻辑回归和生存分析但不一定有纵向分析或估计方程方法经验的统计学家提供纵向二元数据的分析入门。我们描述了统计方法的最新进展,这些进展在应用中可以发挥实际作用,并且可以通过标准软件获得。我们的方法强调了提出明确研究问题的重要性,对于二元结果,我们建议最好围绕患病率和发病率的关键流行病学概念来组织这些问题。对于患病率问题,我们展示了只要样本量足够大,无偏估计方程和信息三明治方差估计如何用于进行有效且稳健的分析。我们还展示了估计方程方法如何轻松扩展以适应对缺失数据的调整和复杂的调查设计。对我们吸烟结果中随时间的性别差异进行详细讨论,以强调在区分纵向信息和横断面信息时需要格外小心。我们展示了如何使用比例风险回归模型的离散时间版本来解决发病率问题。这种方法具有提供相对风险估计、在标准软件中可行以及还允许稳健的信息三明治方差估计等优点。