Suppr超能文献

数据分布:正常还是异常?

Data Distribution: Normal or Abnormal?

机构信息

Past President, World Association of Medical Editors (WAME), Editorial Consultant, The Lancet, Associate Editor, Frontiers in Epidemiology.

出版信息

J Korean Med Sci. 2024 Jan 22;39(3):e35. doi: 10.3346/jkms.2024.39.e35.

Abstract

Determining if the frequency distribution of a given data set follows a normal distribution or not is among the first steps of data analysis. Visual examination of the data, commonly by Q-Q plot, although is acceptable by many scientists, is considered subjective and not acceptable by other researchers. One-sample Kolmogorov-Smirnov test with Lilliefors correction (for a sample size ≥ 50) and Shapiro-Wilk test (for a sample size < 50) are common statistical tests for checking the normality of a data set quantitatively. As parametric tests, which assume that the data distribution is normal (Gaussian, bell-shaped), are more robust compared to their non-parametric counterparts, we commonly use transformations (e.g., log-transformation, Box-Cox transformation, etc.) to make the frequency distribution of non-normally distributed data close to a normal distribution. Herein, I wish to reflect on presenting how to practically work with these statistical methods through examining of real data sets.

摘要

确定给定数据集的频率分布是否服从正态分布是数据分析的第一步。虽然通过 Q-Q 图对数据进行直观检查被许多科学家所接受,但它被认为是主观的,不受其他研究人员的认可。对于样本量≥50 的数据,使用带有 Lilliefors 修正的单样本 Kolmogorov-Smirnov 检验,对于样本量<50 的数据,使用 Shapiro-Wilk 检验是定量检查数据集正态性的常用统计检验方法。由于参数检验假设数据分布是正态的(高斯分布,钟形分布),与非参数检验相比更稳健,因此我们通常使用变换(例如对数变换、Box-Cox 变换等)使非正态分布数据的频率分布更接近正态分布。在此,我希望通过检查实际数据集来反思如何实际使用这些统计方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d49c/10803211/6dff49ff0c4d/jkms-39-e35-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验