Aloisio Kathryn M, Swanson Sonja A, Micali Nadia, Field Alison, Horton Nicholas J
Smith College, Northampton, MA.
Harvard School of Public Health, Boston, MA.
Stata J. 2014 Oct 1;14(4):863-883.
Clustered data arise in many settings, particularly within the social and biomedical sciences. As an example, multiple-source reports are commonly collected in child and adolescent psychiatric epidemiologic studies where researchers use various informants (e.g. parent and adolescent) to provide a holistic view of a subject's symptomatology. Fitzmaurice et al. (1995) have described estimation of multiple source models using a standard generalized estimating equation (GEE) framework. However, these studies often have missing data due to additional stages of consent and assent required. The usual GEE is unbiased when missingness is Missing Completely at Random (MCAR) in the sense of Little and Rubin (2002). This is a strong assumption that may not be tenable. Other options such as weighted generalized estimating equations (WEEs) are computationally challenging when missingness is non-monotone. Multiple imputation is an attractive method to fit incomplete data models while only requiring the less restrictive Missing at Random (MAR) assumption. Previously estimation of partially observed clustered data was computationally challenging however recent developments in Stata have facilitated their use in practice. We demonstrate how to utilize multiple imputation in conjunction with a GEE to investigate the prevalence of disordered eating symptoms in adolescents reported by parents and adolescents as well as factors associated with concordance and prevalence. The methods are motivated by the Avon Longitudinal Study of Parents and their Children (ALSPAC), a cohort study that enrolled more than 14,000 pregnant mothers in 1991-92 and has followed the health and development of their children at regular intervals. While point estimates were fairly similar to the GEE under MCAR, the MAR model had smaller standard errors, while requiring less stringent assumptions regarding missingness.
聚类数据出现在许多场景中,尤其是在社会科学和生物医学领域。例如,在儿童和青少年精神病流行病学研究中,通常会收集多源报告,研究人员会使用各种信息提供者(如父母和青少年)来全面了解受试者的症状。菲茨莫里斯等人(1995年)描述了使用标准广义估计方程(GEE)框架对多源模型进行估计。然而,由于需要额外的同意和赞成阶段,这些研究往往存在缺失数据。在利特尔和鲁宾(2002年)的意义上,当缺失是完全随机缺失(MCAR)时,通常的GEE是无偏的。这是一个可能不成立的强假设。当缺失是非单调的时,其他选项如加权广义估计方程(WEE)在计算上具有挑战性。多重填补是一种拟合不完全数据模型的有吸引力的方法,只需要限制较少的随机缺失(MAR)假设。以前,对部分观察到的聚类数据进行估计在计算上具有挑战性,但最近Stata的发展促进了它们在实践中的应用。我们展示了如何结合使用多重填补和GEE来调查父母和青少年报告的青少年饮食失调症状的患病率以及与一致性和患病率相关的因素。这些方法是受雅芳父母与儿童纵向研究(ALSPAC)的启发,这是一项队列研究,在1991 - 92年招募了超过14000名怀孕母亲,并定期跟踪她们孩子的健康和发育情况。虽然在MCAR下点估计与GEE相当相似,但MAR模型的标准误差较小,同时对缺失的假设要求不那么严格。