Centre for Biostatistics and Epidemiology, Department of Biostatistics, University of Oslo, P.O. Box 1122, Blindern, N-0317, Oslo, Norway.
BMC Med Res Methodol. 2019 Feb 7;19(1):28. doi: 10.1186/s12874-019-0667-2.
It is common in applied epidemiological and clinical research to convert continuous variables into categorical variables by grouping values into categories. Such categorized variables are then often used as exposure variables in some regression model. There are numerous statistical arguments why this practice should be avoided, and in this paper we present yet another such argument.
We show that categorization may lead to spurious interaction in multiple regression models. We give precise analytical expressions for when this may happen in the linear regression model with normally distributed exposure variables, and we show by simulations that the analytical results are valid also for other distributions. Further, we give an interpretation of the results in terms of a measurement error problem.
We show that, in the case of a linear model with two normally distributed exposure variables, both categorized at the same cut point, a spurious interaction will be induced unless the two variables are categorized at the median or they are uncorrelated. In simulations with exposure variables following other distributions, we confirm this general effect of categorization, but we also show that the effect of the choice of cut point varies over different distributions.
Categorization of continuous exposure variables leads to a number of problems, among them spurious interaction effects. Hence, this practice should be avoided and other methods should be considered.
在应用流行病学和临床研究中,将连续变量转换为分类变量通常是通过将值分组到类别中实现的。然后,这些分类变量通常被用作某些回归模型中的暴露变量。有许多统计学论据表明这种做法应该避免,本文提出了另一个这样的论据。
我们表明,分类可能导致多元回归模型中的虚假交互作用。我们给出了在具有正态分布暴露变量的线性回归模型中何时可能发生这种情况的精确解析表达式,并通过模拟表明,这些分析结果对于其他分布也是有效的。此外,我们根据测量误差问题给出了结果的解释。
我们表明,在具有两个正态分布暴露变量的线性模型中,除非两个变量在中位数处分类或它们不相关,否则将在同一截断点处对两者进行分类会导致虚假交互作用。在对其他分布的暴露变量进行模拟时,我们确认了这种分类的一般效果,但我们也表明,截断点选择的效果因不同分布而异。
连续暴露变量的分类会导致许多问题,包括虚假交互作用效应。因此,这种做法应该避免,而应考虑其他方法。