Department of Internal Medicine, UNESP, Univ Estadual Paulista, Botucatu, Brazil.
Hospital do Rim, Universidade Federal de São Paulo, So Paulo, Brazil.
PLoS One. 2020 Feb 11;15(2):e0228842. doi: 10.1371/journal.pone.0228842. eCollection 2020.
One overlooked problem in statistical analysis is lateral collinearity, a phenomenon that may occur when the outcome variable derives from the predictors. In nephrology this issue is seen with the use of estimated glomerular filtration rate (eGFR) as an outcome and age, sex, and ethnicity as predictors. In this study with simulated data, we aim to illustrate this problem.
We randomly generated unrelated data to estimate eGFR by common equations.
Using simulated data, we show that age, gender, and ethnicity (recycled predictors variables) are statistically significantly correlated with eGFR in linear regression analysis. Whereas the initial obvious conclusion is that age, sex, and ethnicity are strong predictors of eGFR, more rigorous interpretation suggests that this is a byproduct of the mathematical model produced when deriving new predictors from another.
While statistical models have the ability to identify vertical collinearity (predictor-predictor), lateral collinearity (predictor-outcome) is seldom identified and discussed in statistical analysis. Therefore, caution is needed when interpreting the correlation between age, gender, and ethnicity with eGFR derived from regression analyses.
在统计分析中,一个被忽视的问题是横向共线性,当因变量来自预测变量时,就会出现这种现象。在肾脏病学中,这种问题出现在使用估算肾小球滤过率(eGFR)作为因变量和年龄、性别和种族作为预测变量时。在这项使用模拟数据的研究中,我们旨在说明这个问题。
我们随机生成无关数据,通过常用方程估算 eGFR。
使用模拟数据,我们表明在线性回归分析中,年龄、性别和种族(回收的预测变量)与 eGFR 呈统计学显著相关。虽然最初的明显结论是年龄、性别和种族是 eGFR 的强有力预测因子,但更严格的解释表明,这是从另一个变量推导出新预测变量时产生的数学模型的副产品。
虽然统计模型有能力识别垂直共线性(预测因子-预测因子),但在统计分析中很少识别和讨论横向共线性(预测因子-因变量)。因此,在解释从回归分析中得出的年龄、性别和种族与 eGFR 之间的相关性时需要谨慎。