Bruyndonckx Robin, Hens Niel, Aerts Marc
Interuniversity Institute for Biostatistics and statistical Bioinformatics (I-BIOSTAT), Hasselt University, Diepenbeek, Belgium.
Laboratory of Medical Microbiology, Vaccine & Infectious Disease Institute (VAXINFECTIO), University of Antwerp, Antwerp, Belgium.
Biom J. 2018 Jan;60(1):49-65. doi: 10.1002/bimj.201700025. Epub 2017 Oct 25.
Data in medical sciences often have a hierarchical structure with lower level units (e.g. children) nested in higher level units (e.g. departments). Several specific but frequently studied settings, mainly in longitudinal and family research, involve a large number of units that tend to be quite small, with units containing only one element referred to as singletons. Regardless of sparseness, hierarchical data should be analyzed with appropriate methodology such as, for example linear-mixed models. Using a simulation study, based on the structure of a data example on Ceftriaxone consumption in hospitalized children, we assess the impact of an increasing proportion of singletons (0-95%), in data with a low, medium, or high intracluster correlation, on the stability of linear-mixed models parameter estimates, confidence interval coverage and F test performance. Some techniques that are frequently used in the presence of singletons include ignoring clustering, dropping the singletons from the analysis and grouping the singletons into an artificial unit. We show that both the fixed and random effects estimates and their standard errors are stable in the presence of an increasing proportion of singletons. We demonstrate that ignoring clustering and dropping singletons should be avoided as they come with biased standard error estimates. Grouping the singletons into an artificial unit might be considered, although the linear-mixed model performs better even when the proportion of singletons is high. We conclude that the linear-mixed model is stable in the presence of singletons when both lower- and higher level sample sizes are fixed. In this setting, the use of remedial measures, such as ignoring clustering and grouping or removing singletons, should be dissuaded.
医学科学中的数据通常具有层次结构,较低层次的单位(如儿童)嵌套在较高层次的单位(如科室)中。一些特定但经常研究的情况,主要是在纵向研究和家庭研究中,涉及大量规模往往很小的单位,其中仅包含一个元素的单位被称为单例。无论数据稀疏与否,都应使用适当的方法(如线性混合模型)来分析层次数据。基于一个关于住院儿童头孢曲松消耗量的数据示例结构,我们通过模拟研究评估了在低、中、高组内相关性的数据中,单例比例增加(0 - 95%)对线性混合模型参数估计的稳定性、置信区间覆盖率和F检验性能的影响。在存在单例的情况下经常使用的一些技术包括忽略聚类、从分析中剔除单例以及将单例分组为一个人工单位。我们表明,在单例比例增加的情况下,固定效应和随机效应估计及其标准误都是稳定的。我们证明应避免忽略聚类和剔除单例,因为它们会带来有偏差的标准误估计。可以考虑将单例分组为一个人工单位,尽管即使单例比例很高时线性混合模型的表现也更好。我们得出结论,当较低层次和较高层次的样本量都固定时,线性混合模型在存在单例的情况下是稳定的。在这种情况下,应劝阻使用诸如忽略聚类、分组或剔除单例等补救措施。