Health Data Science, University of Liverpool Faculty of Health and Life Sciences, Liverpool, UK.
Int J Biostat. 2021 Mar 26;18(1):279-292. doi: 10.1515/ijb-2019-0159.
Mixed models are a useful way of analysing longitudinal data. Random effects terms allow modelling of patient specific deviations from the overall trend over time. Correlation between repeated measurements are captured by specifying a joint distribution for all random effects in a model. Typically, this joint distribution is assumed to be a multivariate normal distribution. For Gaussian outcomes misspecification of the random effects distribution usually has little impact. However, when the outcome is discrete (e.g. counts or binary outcomes) generalised linear mixed models (GLMMs) are used to analyse longitudinal trends. Opinion is divided about how robust GLMMs are to misspecification of the random effects. Previous work explored the impact of random effects misspecification on the bias of model parameters in single outcome GLMMs. Accepting that these model parameters may be biased, we investigate whether this affects our ability to classify patients into clinical groups using a longitudinal discriminant analysis. We also consider multiple outcomes, which can significantly increase the dimensions of the random effects distribution when modelled simultaneously. We show that when there is severe departure from normality, more flexible mixture distributions can give better classification accuracy. However, in many cases, wrongly assuming a single multivariate normal distribution has little impact on classification accuracy.
混合模型是分析纵向数据的一种有效方法。随机效应项允许对患者随时间从总体趋势的特定偏差进行建模。通过为模型中所有随机效应指定联合分布,可以捕获重复测量之间的相关性。通常,假设该联合分布是多元正态分布。对于高斯结果,随机效应分布的不正确指定通常影响不大。但是,当结果是离散的(例如,计数或二项结果)时,通常使用广义线性混合模型(GLMM)来分析纵向趋势。对于随机效应的不正确指定,GLMM 的稳健性如何存在分歧。以前的工作探讨了随机效应的不正确指定对单结果 GLMM 中模型参数偏差的影响。接受这些模型参数可能存在偏差,我们研究这是否会影响我们使用纵向判别分析将患者分类为临床组的能力。我们还考虑了多个结果,当同时对其进行建模时,这些结果可以显著增加随机效应分布的维度。我们表明,当严重偏离正态性时,更灵活的混合分布可以提供更好的分类准确性。但是,在许多情况下,错误地假设单个多元正态分布对分类准确性几乎没有影响。