Suppr超能文献

线性回归与正态性假设。

Linear regression and the normality assumption.

机构信息

Faculty of Population Health, Institute of Cardiovascular Science, University College London, London WC1E 6BT, United Kingdom; Groningen Research Institute of Pharmacy, University of Groningen, Groningen, The Netherlands; Department of Cardiology, Division Heart and Lungs, University Medical Center Utrecht, Utrecht, The Netherlands.

Faculty of Population Health, Institute of Cardiovascular Science, University College London, London WC1E 6BT, United Kingdom.

出版信息

J Clin Epidemiol. 2018 Jun;98:146-151. doi: 10.1016/j.jclinepi.2017.12.006. Epub 2017 Dec 16.

Abstract

OBJECTIVES

Researchers often perform arbitrary outcome transformations to fulfill the normality assumption of a linear regression model. This commentary explains and illustrates that in large data settings, such transformations are often unnecessary, and worse may bias model estimates.

STUDY DESIGN AND SETTING

Linear regression assumptions are illustrated using simulated data and an empirical example on the relation between time since type 2 diabetes diagnosis and glycated hemoglobin levels. Simulation results were evaluated on coverage; i.e., the number of times the 95% confidence interval included the true slope coefficient.

RESULTS

Although outcome transformations bias point estimates, violations of the normality assumption in linear regression analyses do not. The normality assumption is necessary to unbiasedly estimate standard errors, and hence confidence intervals and P-values. However, in large sample sizes (e.g., where the number of observations per variable is >10) violations of this normality assumption often do not noticeably impact results. Contrary to this, assumptions on, the parametric model, absence of extreme observations, homoscedasticity, and independency of the errors, remain influential even in large sample size settings.

CONCLUSION

Given that modern healthcare research typically includes thousands of subjects focusing on the normality assumption is often unnecessary, does not guarantee valid results, and worse may bias estimates due to the practice of outcome transformations.

摘要

目的

研究人员经常对结果进行任意转换,以满足线性回归模型的正态性假设。本评论解释并说明了在大数据环境下,这种转换通常是不必要的,而且更糟糕的是,可能会使模型估计产生偏差。

研究设计和设置

使用模拟数据和一个关于 2 型糖尿病诊断后时间与糖化血红蛋白水平之间关系的实证示例来说明线性回归假设。模拟结果基于覆盖范围进行评估;即,95%置信区间包含真实斜率系数的次数。

结果

尽管结果转换会使点估计产生偏差,但线性回归分析中对正态性假设的违反并不会产生偏差。正态性假设是无偏估计标准误差的必要条件,因此也是置信区间和 P 值的必要条件。然而,在大样本量(例如,每个变量的观测数>10)的情况下,违反正态性假设通常不会显著影响结果。与这一情况相反,参数模型的假设、无极端观测值、同方差性和误差的独立性,即使在大样本量设置下仍然具有影响力。

结论

鉴于现代医疗保健研究通常包含数千名研究对象,关注正态性假设通常是不必要的,不能保证结果有效,而且更糟糕的是,由于对结果进行转换的做法,可能会使估计值产生偏差。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验