Department of Psychological Sciences, Birkbeck, University of London, London, UK.
Department of Non-communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK.
Eur J Hum Genet. 2018 Aug;26(8):1194-1201. doi: 10.1038/s41431-018-0159-6. Epub 2018 Apr 30.
Many statistical tests rely on the assumption that the residuals of a model are normally distributed. Rank-based inverse normal transformation (INT) of the dependent variable is one of the most popular approaches to satisfy the normality assumption. When covariates are included in the analysis, a common approach is to first adjust for the covariates and then normalize the residuals. This study investigated the effect of regressing covariates against the dependent variable and then applying rank-based INT to the residuals. The correlation between the dependent variable and covariates at each stage of processing was assessed. An alternative approach was tested in which rank-based INT was applied to the dependent variable before regressing covariates. Analyses based on both simulated and real data examples demonstrated that applying rank-based INT to the dependent variable residuals after regressing out covariates re-introduces a linear correlation between the dependent variable and covariates, increasing type-I errors and reducing power. On the other hand, when rank-based INT was applied prior to controlling for covariate effects, residuals were normally distributed and linearly uncorrelated with covariates. This latter approach is therefore recommended in situations were normality of the dependent variable is required.
许多统计检验都依赖于模型残差正态分布的假设。对因变量进行基于秩的逆正态变换(INT)是满足正态性假设的最常用方法之一。当分析中包含协变量时,一种常见的方法是首先调整协变量,然后对残差进行归一化。本研究探讨了将协变量回归到因变量,然后对残差进行基于秩的 INT 的效果。评估了在处理的每个阶段因变量和协变量之间的相关性。测试了一种替代方法,即在对协变量进行回归之前,对因变量应用基于秩的 INT。基于模拟和真实数据示例的分析表明,在剔除协变量后对因变量残差进行基于秩的 INT 会重新引入因变量和协变量之间的线性相关性,增加 I 型错误并降低功效。另一方面,当在控制协变量效应之前应用基于秩的 INT 时,残差呈正态分布且与协变量线性无关。因此,在需要因变量正态性的情况下,建议采用后一种方法。