I-BioStat, Center for Statistics, Universiteit Hasselt, Diepenbeek, B-3590, Belgium.
Arch Public Health. 2012 Apr 11;70(1):7. doi: 10.1186/0778-7367-70-7.
In medical and biomedical areas, binary and binomial outcomes are very common. Such data are often collected longitudinally from a given subject repeatedly overtime, which result in clustering of the observations within subjects, leading to correlation, on the one hand. The repeated binary outcomes from a given subject, on the other hand, constitute a binomial outcome, where the prescribed mean-variance relationship is often violated, leading to the so-called overdispersion.
Two longitudinal binary data sets, collected in south western Ethiopia: the Jimma infant growth study, where the child's early growth is studied, and the Jimma longitudinal family survey of youth where the adolescent's school attendance is studied over time, are considered. A new model which combines both overdispersion, and correlation simultaneously, also known as the combined model is applied. In addition, the commonly used methods for binary and binomial data, such as the simple logistic, which accounts neither for the overdispersion nor the correlation, the beta-binomial model, and the logistic-normal model, which accommodate only for the overdispersion, and correlation, respectively, are also considered for comparison purpose. As an alternative estimation technique, a Bayesian implementation of the combined model is also presented.
The combined model results in model improvement in fit, and hence the preferred one, based on likelihood comparison, and DIC criterion. Further, the two estimation approaches result in fairly similar parameter estimates and inferences in both of our case studies. Early initiation of breastfeeding has a protective effect against the risk of overweight in late infancy (p = 0.001), while proportion of overweight seems to be invariant among males and females overtime (p = 0.66). Gender is significantly associated with school attendance, where girls have a lower rate of attendance (p = 0.001) as compared to boys.
We applied a flexible modeling framework to analyze binary and binomial longitudinal data. Instead of accounting for overdispersion, and correlation separately, both can be accommodated simultaneously, by allowing two separate sets of the beta, and the normal random effects at once.
在医学和生物医学领域,二项和二项式结果非常常见。此类数据通常是从给定的个体中随时间重复进行纵向收集的,这会导致观察值在个体内聚类,从而导致相关性,一方面。另一方面,来自给定个体的重复二项结果构成二项式结果,其中规定的均值方差关系经常被违反,导致所谓的过离散。
考虑了两个在埃塞俄比亚西南部收集的纵向二项数据:吉马婴儿生长研究,研究儿童的早期生长情况,以及吉马青少年纵向家庭调查,研究青少年随时间的上学出勤率。应用了一种新的模型,该模型同时结合了过离散和相关性,也称为组合模型。此外,还考虑了用于二项和二项式数据的常用方法,例如既不考虑过离散也不考虑相关性的简单逻辑,仅考虑过离散的贝塔二项式模型和逻辑正态模型,以及相关性。作为替代估计技术,还提出了组合模型的贝叶斯实现。
基于似然比比较和 DIC 标准,组合模型的结果导致模型拟合度提高,因此是首选模型。此外,在我们的两个案例研究中,两种估计方法都导致了相当相似的参数估计和推断。早期开始母乳喂养对婴儿后期超重的风险具有保护作用(p=0.001),而超重的比例在男性和女性中随时间推移似乎保持不变(p=0.66)。性别与上学出勤率显著相关,女孩的出勤率较低(p=0.001)与男孩相比。
我们应用了灵活的建模框架来分析二项和二项式纵向数据。不是分别考虑过离散和相关性,而是可以同时允许同时设置两组 beta 和正态随机效应来同时容纳两者。