• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

数据分布:正常还是异常?

Data Distribution: Normal or Abnormal?

机构信息

Past President, World Association of Medical Editors (WAME), Editorial Consultant, The Lancet, Associate Editor, Frontiers in Epidemiology.

出版信息

J Korean Med Sci. 2024 Jan 22;39(3):e35. doi: 10.3346/jkms.2024.39.e35.

DOI:10.3346/jkms.2024.39.e35
PMID:38258367
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10803211/
Abstract

Determining if the frequency distribution of a given data set follows a normal distribution or not is among the first steps of data analysis. Visual examination of the data, commonly by Q-Q plot, although is acceptable by many scientists, is considered subjective and not acceptable by other researchers. One-sample Kolmogorov-Smirnov test with Lilliefors correction (for a sample size ≥ 50) and Shapiro-Wilk test (for a sample size < 50) are common statistical tests for checking the normality of a data set quantitatively. As parametric tests, which assume that the data distribution is normal (Gaussian, bell-shaped), are more robust compared to their non-parametric counterparts, we commonly use transformations (e.g., log-transformation, Box-Cox transformation, etc.) to make the frequency distribution of non-normally distributed data close to a normal distribution. Herein, I wish to reflect on presenting how to practically work with these statistical methods through examining of real data sets.

摘要

确定给定数据集的频率分布是否服从正态分布是数据分析的第一步。虽然通过 Q-Q 图对数据进行直观检查被许多科学家所接受,但它被认为是主观的,不受其他研究人员的认可。对于样本量≥50 的数据,使用带有 Lilliefors 修正的单样本 Kolmogorov-Smirnov 检验,对于样本量<50 的数据,使用 Shapiro-Wilk 检验是定量检查数据集正态性的常用统计检验方法。由于参数检验假设数据分布是正态的(高斯分布,钟形分布),与非参数检验相比更稳健,因此我们通常使用变换(例如对数变换、Box-Cox 变换等)使非正态分布数据的频率分布更接近正态分布。在此,我希望通过检查实际数据集来反思如何实际使用这些统计方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d49c/10803211/3fb9ff51ed22/jkms-39-e35-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d49c/10803211/6dff49ff0c4d/jkms-39-e35-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d49c/10803211/5fd944d318ec/jkms-39-e35-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d49c/10803211/53c23a78182f/jkms-39-e35-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d49c/10803211/697c91dfa249/jkms-39-e35-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d49c/10803211/3fb9ff51ed22/jkms-39-e35-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d49c/10803211/6dff49ff0c4d/jkms-39-e35-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d49c/10803211/5fd944d318ec/jkms-39-e35-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d49c/10803211/53c23a78182f/jkms-39-e35-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d49c/10803211/697c91dfa249/jkms-39-e35-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d49c/10803211/3fb9ff51ed22/jkms-39-e35-g005.jpg

相似文献

1
Data Distribution: Normal or Abnormal?数据分布:正常还是异常?
J Korean Med Sci. 2024 Jan 22;39(3):e35. doi: 10.3346/jkms.2024.39.e35.
2
Fundamentals of Research Data and Variables: The Devil Is in the Details.研究数据和变量基础:细节决定成败。
Anesth Analg. 2017 Oct;125(4):1375-1380. doi: 10.1213/ANE.0000000000002370.
3
Testing experimental data for univariate normality.检验实验数据的单变量正态性。
Clin Chim Acta. 2006 Apr;366(1-2):112-29. doi: 10.1016/j.cca.2005.11.007. Epub 2006 Jan 4.
4
Glasgow coma scale and APACHE II system data--are they normally distributed?格拉斯哥昏迷量表和急性生理与慢性健康状况评分系统II的数据——它们呈正态分布吗?
Chirurgia (Bucur). 2009 Jan-Feb;104(1):73-8.
5
Sensitivity and specificity of normality tests and consequences on reference interval accuracy at small sample size: a computer-simulation study.小样本量下正态性检验的敏感性和特异性及其对参考区间准确性的影响:一项计算机模拟研究
Vet Clin Pathol. 2016 Dec;45(4):648-656. doi: 10.1111/vcp.12390. Epub 2016 Aug 24.
6
Informal versus formal judgment of statistical models: The case of normality assumptions.非形式化与形式化判断统计模型:正态性假设案例。
Psychon Bull Rev. 2021 Aug;28(4):1164-1182. doi: 10.3758/s13423-021-01879-z. Epub 2021 Mar 3.
7
Optimal transformations leading to normal distributions of positron emission tomography standardized uptake values.导致正电子发射断层扫描标准化摄取值呈正态分布的最优变换。
Phys Med Biol. 2018 Jan 30;63(3):035021. doi: 10.1088/1361-6560/aaa175.
8
Preliminary testing for normality: some statistical aspects of a common concept.正态性的初步检验:一个常见概念的一些统计学方面
Clin Exp Dermatol. 2006 Nov;31(6):757-61. doi: 10.1111/j.1365-2230.2006.02206.x.
9
Ditching the norm: Using alternative distributions for biological data analysis.摒弃常规:使用替代分布进行生物数据分析。
Lab Anim. 2024 Oct;58(5):438-442. doi: 10.1177/00236772241246602. Epub 2024 Aug 19.
10
To test or not to test: Preliminary assessment of normality when comparing two independent samples.是否进行检验:比较两个独立样本时对正态性的初步评估。
BMC Med Res Methodol. 2012 Jun 19;12:81. doi: 10.1186/1471-2288-12-81.

引用本文的文献

1
Pre-existing YFV-17D immunity mediates T cell cross-protection against dengue virus serotype 2 infection in mice.预先存在的黄热病病毒17D免疫力介导小鼠T细胞对登革病毒2型感染的交叉保护作用。
Commun Biol. 2025 Sep 2;8(1):1334. doi: 10.1038/s42003-025-08793-3.
2
A comprehensive analysis of BMI prediction using machine learning and biochemical markers: Insights from NHANES data.使用机器学习和生化标志物进行BMI预测的综合分析:来自美国国家健康与营养检查调查(NHANES)数据的见解
Medicine (Baltimore). 2025 Jul 11;104(28):e42781. doi: 10.1097/MD.0000000000042781.
3
A Cross-Sectional Study on Protein Substitutes for Paediatric Phenylketonuria Diet: Time to Pay Attention.

本文引用的文献

1
Assumption-checking rather than (just) testing: The importance of visualization and effect size in statistical diagnostics.假设检验而非(仅仅)检验:统计诊断中可视化和效应大小的重要性。
Behav Res Methods. 2024 Feb;56(2):826-845. doi: 10.3758/s13428-023-02072-x. Epub 2023 Mar 3.
2
No need for a gold-standard test: on the mining of diagnostic test performance indices merely based on the distribution of the test value.无需金标准测试:仅基于检测值分布即可挖掘诊断检测性能指标。
BMC Med Res Methodol. 2023 Jan 30;23(1):30. doi: 10.1186/s12874-023-01841-8.
3
Determining the SARS-CoV-2 serological immunoassay test performance indices based on the test results frequency distribution.
一项关于儿科苯丙酮尿症饮食蛋白质替代品的横断面研究:是时候予以关注了。
Nutrients. 2025 May 23;17(11):1767. doi: 10.3390/nu17111767.
4
Exploring the relationship between professional identity, cultural sensibility, and empathy among nursing students: evidence from path analysis.探索护理专业学生的职业认同感、文化敏感性和同理心之间的关系:路径分析的证据
BMC Nurs. 2025 Jun 2;24(1):628. doi: 10.1186/s12912-025-03235-1.
5
Terminal efficiency of Peruvian university students in the second specialty programs of a dental school over seven years.七年间秘鲁大学生在牙科学院第二专业课程中的结业效率
F1000Res. 2025 May 9;13:1307. doi: 10.12688/f1000research.157705.2. eCollection 2024.
6
On Using Standard Deviation or Standard Error of the Mean.关于使用标准差或均值标准误差
Iran J Med Sci. 2025 May 1;50(5):274-277. doi: 10.30476/ijms.2025.106283.4041. eCollection 2025 May.
7
Comparison of the determination of cortisol in canine saliva by chemiluminescence immunoassay and liquid chromatography-tandem mass spectrometry.化学发光免疫分析法与液相色谱-串联质谱法测定犬唾液中皮质醇的比较。
J Vet Diagn Invest. 2025 May 14:10406387251327413. doi: 10.1177/10406387251327413.
8
Early-career registered nurses' work experience and nurse outcomes in South African hospitals: a cross-sectional survey.南非医院早期职业注册护士的工作经历与护理结果:一项横断面调查。
BMC Nurs. 2025 May 13;24(1):523. doi: 10.1186/s12912-025-03188-5.
9
Effect of Cerebrolysin on Cognitive Function and Delirium in Coronary Artery Bypass Graft Patients.脑活素对冠状动脉搭桥手术患者认知功能和谵妄的影响。
Med Sci Monit. 2025 May 12;31:e947864. doi: 10.12659/MSM.947864.
10
Antimicrobial Susceptibility and Toxin Gene Profiles of Commensal Isolates from Turkeys in Hungarian Poultry Farms (2022-2023).匈牙利家禽养殖场火鸡共生分离株的抗菌药敏性和毒素基因谱(2022 - 2023年)
Antibiotics (Basel). 2025 Apr 17;14(4):413. doi: 10.3390/antibiotics14040413.
基于检测结果频率分布确定 SARS-CoV-2 血清学免疫分析检测的性能指标。
Biochem Med (Zagreb). 2022 Jun 15;32(2):020705. doi: 10.11613/BM.2022.020705.
4
On the information hidden in a classifier distribution.关于分类器分布中的信息隐藏。
Sci Rep. 2021 Jan 13;11(1):917. doi: 10.1038/s41598-020-79548-9.
5
Data transformation: a focus on the interpretation.数据转换:重点在于解释。
Korean J Anesthesiol. 2020 Dec;73(6):503-508. doi: 10.4097/kja.20137. Epub 2020 Nov 20.
6
Statistical data presentation: a primer for rheumatology researchers.统计学数据呈现:风湿病学研究人员入门指南。
Rheumatol Int. 2021 Jan;41(1):43-55. doi: 10.1007/s00296-020-04740-z. Epub 2020 Nov 17.
7
Assessing assumptions for statistical analyses in randomised clinical trials.评估随机临床试验中统计分析的假设。
BMJ Evid Based Med. 2019 Oct;24(5):185-189. doi: 10.1136/bmjebm-2019-111174. Epub 2019 Apr 4.
8
Descriptive statistics and normality tests for statistical data.统计数据的描述性统计和正态性检验。
Ann Card Anaesth. 2019 Jan-Mar;22(1):67-72. doi: 10.4103/aca.ACA_157_18.
9
Statistical Data Editing in Scientific Articles.科学文章中的统计数据编辑
J Korean Med Sci. 2017 Jul;32(7):1072-1076. doi: 10.3346/jkms.2017.32.7.1072.
10
Best (but oft-forgotten) practices: checking assumptions concerning regression residuals.最佳(但常常被遗忘)的做法:检查关于回归残差的假设。
Am J Clin Nutr. 2015 Sep;102(3):533-9. doi: 10.3945/ajcn.115.113498. Epub 2015 Jul 22.