Park Seho, Chung Yujin
Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN 46202, USA.
Department of Applied Statistics, Kyonggi University, Suwon 16227, Korea.
Genomics Inform. 2022 Sep;20(3):e34. doi: 10.5808/gi.22052. Epub 2022 Sep 30.
Multilevel analysis is an appropriate and powerful tool for analyzing hierarchical structure data widely applied from public health to genomic data. In practice, however, we may lose the information on multiple nesting levels in the multilevel analysis since data may fail to capture all levels of hierarchy, or the top or intermediate levels of hierarchy are ignored in the analysis. In this study, we consider a multilevel linear mixed effect model (LMM) with single imputation that can involve all data hierarchy levels in the presence of missing top or intermediate-level clusters. We evaluate and compare the performance of a multilevel LMM with single imputation with other models ignoring the data hierarchy or missing intermediate-level clusters. To this end, we applied a multilevel LMM with single imputation and other models to hierarchically structured cohort data with some intermediate levels missing and to simulated data with various cluster sizes and missing rates of intermediate-level clusters. A thorough simulation study demonstrated that an LMM with single imputation estimates fixed coefficients and variance components of a multilevel model more accurately than other models ignoring data hierarchy or missing clusters in terms of mean squared error and coverage probability. In particular, when models ignoring data hierarchy or missing clusters were applied, the variance components of random effects were overestimated. We observed similar results from the analysis of hierarchically structured cohort data.
多水平分析是一种适用于分析从公共卫生到基因组数据等广泛应用的层次结构数据的强大工具。然而在实践中,由于数据可能无法捕捉到层次结构的所有级别,或者在分析中忽略了层次结构的顶层或中间层,我们在多水平分析中可能会丢失关于多个嵌套级别的信息。在本研究中,我们考虑一种具有单次插补的多水平线性混合效应模型(LMM),该模型在存在缺失的顶层或中间层聚类时可以纳入所有数据层次级别。我们评估并比较了具有单次插补的多水平LMM与其他忽略数据层次结构或缺失中间层聚类的模型的性能。为此,我们将具有单次插补的多水平LMM和其他模型应用于具有一些缺失中间层的层次结构队列数据以及具有各种聚类大小和中间层聚类缺失率的模拟数据。一项全面的模拟研究表明,就均方误差和覆盖概率而言,具有单次插补的LMM比其他忽略数据层次结构或缺失聚类的模型更准确地估计多水平模型的固定系数和方差分量。特别是,当应用忽略数据层次结构或缺失聚类的模型时,随机效应的方差分量被高估。我们从对层次结构队列数据的分析中观察到了类似的结果。