Marston Louise, Peacock Janet L, Yu Keming, Brocklehurst Peter, Calvert Sandra A, Greenough Anne, Marlow Neil
Department of Primary Care and Population Health, Computing and Mathematics, Brunel University, London, UK.
Paediatr Perinat Epidemiol. 2009 Jul;23(4):380-92. doi: 10.1111/j.1365-3016.2009.01046.x.
Studies of prematurely born infants contain a relatively large percentage of multiple births, so the resulting data have a hierarchical structure with small clusters of size 1, 2 or 3. Ignoring the clustering may lead to incorrect inferences. The aim of this study was to compare statistical methods which can be used to analyse such data: generalised estimating equations, multilevel models, multiple linear regression and logistic regression. Four datasets which differed in total size and in percentage of multiple births (n = 254, multiple 18%; n = 176, multiple 9%; n = 10 098, multiple 3%; n = 1585, multiple 8%) were analysed. With the continuous outcome, two-level models produced similar results in the larger dataset, while generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) produced divergent estimates using the smaller dataset. For the dichotomous outcome, most methods, except generalised least squares multilevel modelling (ML GH 'xtlogit' in Stata) gave similar odds ratios and 95% confidence intervals within datasets. For the continuous outcome, our results suggest using multilevel modelling. We conclude that generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) should be used with caution when the dataset is small. Where the outcome is dichotomous and there is a relatively large percentage of non-independent data, it is recommended that these are accounted for in analyses using logistic regression with adjusted standard errors or multilevel modelling. If, however, the dataset has a small percentage of clusters greater than size 1 (e.g. a population dataset of children where there are few multiples) there appears to be less need to adjust for clustering.
对早产婴儿的研究包含相对较大比例的多胞胎,因此所得数据具有层次结构,其中包含大小为1、2或3的小集群。忽略聚类可能会导致错误的推断。本研究的目的是比较可用于分析此类数据的统计方法:广义估计方程、多层模型、多元线性回归和逻辑回归。分析了四个数据集,它们在总规模和多胞胎百分比方面存在差异(n = 254,多胞胎占18%;n = 176,多胞胎占9%;n = 10098,多胞胎占3%;n = 1585,多胞胎占8%)。对于连续结果,在较大的数据集中,两级模型产生了相似的结果,而在较小的数据集中,广义最小二乘多层建模(Stata中的ML GLS 'xtreg')和最大似然多层建模(Stata中的ML MLE 'xtmixed')产生了不同的估计值。对于二分结果,除了广义最小二乘多层建模(Stata中的ML GH 'xtlogit')外,大多数方法在数据集中给出了相似的优势比和95%置信区间。对于连续结果,我们的结果建议使用多层建模。我们得出结论,当数据集较小时,应谨慎使用广义最小二乘多层建模(Stata中的ML GLS 'xtreg')和最大似然多层建模(Stata中的ML MLE 'xtmixed')。当结果是二分的且非独立数据的百分比相对较大时,建议在分析中使用调整标准误差的逻辑回归或多层建模来考虑这些因素。然而,如果数据集中大于1的集群百分比很小(例如,儿童总体数据集,其中多胞胎很少),似乎不太需要对聚类进行调整。