• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

t 检验、非参数检验与大型研究——统计实践中的悖论?

t-tests, non-parametric tests, and large studies--a paradox of statistical practice?

机构信息

Unit of Biostatistics and Epidemiology, Oslo University Hospital, Oslo, N-0407, Norway.

出版信息

BMC Med Res Methodol. 2012 Jun 14;12:78. doi: 10.1186/1471-2288-12-78.

DOI:10.1186/1471-2288-12-78
PMID:22697476
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3445820/
Abstract

BACKGROUND

During the last 30 years, the median sample size of research studies published in high-impact medical journals has increased manyfold, while the use of non-parametric tests has increased at the expense of t-tests. This paper explores this paradoxical practice and illustrates its consequences.

METHODS

A simulation study is used to compare the rejection rates of the Wilcoxon-Mann-Whitney (WMW) test and the two-sample t-test for increasing sample size. Samples are drawn from skewed distributions with equal means and medians but with a small difference in spread. A hypothetical case study is used for illustration and motivation.

RESULTS

The WMW test produces, on average, smaller p-values than the t-test. This discrepancy increases with increasing sample size, skewness, and difference in spread. For heavily skewed data, the proportion of p<0.05 with the WMW test can be greater than 90% if the standard deviations differ by 10% and the number of observations is 1000 in each group. The high rejection rates of the WMW test should be interpreted as the power to detect that the probability that a random sample from one of the distributions is less than a random sample from the other distribution is greater than 50%.

CONCLUSIONS

Non-parametric tests are most useful for small studies. Using non-parametric tests in large studies may provide answers to the wrong question, thus confusing readers. For studies with a large sample size, t-tests and their corresponding confidence intervals can and should be used even for heavily skewed data.

摘要

背景

在过去的 30 年中,发表在高影响力医学期刊上的研究论文的中位数样本量增加了许多倍,而非参数检验的使用则以牺牲 t 检验为代价而增加。本文探讨了这种自相矛盾的做法,并说明了其后果。

方法

使用模拟研究比较了 Wilcoxon-Mann-Whitney(WMW)检验和两样本 t 检验随着样本量增加的拒绝率。从具有相等均值和中位数但分布差异较小的偏态分布中抽取样本。使用一个假设的案例研究来说明和启发。

结果

WMW 检验的平均 p 值小于 t 检验。这种差异随着样本量、偏度和分布差异的增加而增加。对于严重偏态数据,如果每组的标准差相差 10%且观察值数为 1000,则 WMW 检验的 p<0.05 的比例可能大于 90%。WMW 检验的高拒绝率应解释为检测从一个分布中随机抽取的样本小于从另一个分布中随机抽取的样本的概率大于 50%的能力。

结论

非参数检验最适用于小样本研究。在大样本研究中使用非参数检验可能会为错误的问题提供答案,从而使读者感到困惑。对于样本量较大的研究,即使对于严重偏态数据,也可以并且应该使用 t 检验及其对应的置信区间。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ee1/3445820/d30dbaa47cdc/1471-2288-12-78-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ee1/3445820/c276c228e3df/1471-2288-12-78-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ee1/3445820/dc58092b540f/1471-2288-12-78-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ee1/3445820/d30dbaa47cdc/1471-2288-12-78-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ee1/3445820/c276c228e3df/1471-2288-12-78-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ee1/3445820/dc58092b540f/1471-2288-12-78-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ee1/3445820/d30dbaa47cdc/1471-2288-12-78-3.jpg

相似文献

1
t-tests, non-parametric tests, and large studies--a paradox of statistical practice?t 检验、非参数检验与大型研究——统计实践中的悖论?
BMC Med Res Methodol. 2012 Jun 14;12:78. doi: 10.1186/1471-2288-12-78.
2
The Wilcoxon-Mann-Whitney test under scrutiny.接受审视的威尔科克森-曼-惠特尼检验
Stat Med. 2009 May 1;28(10):1487-97. doi: 10.1002/sim.3561.
3
Confidence intervals of the Mann-Whitney parameter that are compatible with the Wilcoxon-Mann-Whitney test.与 Wilcoxon-Mann-Whitney 检验兼容的 Mann-Whitney 参数的置信区间。
Stat Med. 2018 Nov 30;37(27):3991-4006. doi: 10.1002/sim.7890. Epub 2018 Jul 8.
4
Should we always choose a nonparametric test when comparing two apparently nonnormal distributions?当比较两个明显非正态分布时,我们是否应该总是选择非参数检验?
J Clin Epidemiol. 2001 Jan;54(1):86-92. doi: 10.1016/s0895-4356(00)00264-x.
5
Case for omitting tied observations in the two-sample t-test and the Wilcoxon-Mann-Whitney Test.两样本 t 检验和 Wilcoxon-Mann-Whitney 检验中剔除结的情况。
PLoS One. 2018 Jul 24;13(7):e0200837. doi: 10.1371/journal.pone.0200837. eCollection 2018.
6
Analysis of small sample size studies using nonparametric bootstrap test with pooled resampling method.使用合并重采样方法的非参数自助检验对小样本量研究进行分析。
Stat Med. 2017 Jun 30;36(14):2187-2205. doi: 10.1002/sim.7263. Epub 2017 Mar 9.
7
Experimental comparison of parametric versus nonparametric analyses of data from the cold pressor test.冷加压试验数据的参数分析与非参数分析的实验比较
J Pain. 2015 Jun;16(6):537-48. doi: 10.1016/j.jpain.2015.03.001. Epub 2015 Mar 20.
8
A note on consistency of non-parametric rank tests and related rank transformations.关于非参数秩检验和相关秩变换的一致性的注释。
Br J Math Stat Psychol. 2012 Feb;65(1):122-44. doi: 10.1111/j.2044-8317.2011.02017.x. Epub 2011 Apr 26.
9
Wilcoxon-Mann-Whitney test: stratify or not?威尔科克森-曼-惠特尼检验:是否分层?
J Biopharm Stat. 2008;18(6):1103-11. doi: 10.1080/10543400802369103.
10
Comparison of profile-likelihood-based confidence intervals with other rank-based methods for the two-sample problem in ordered categorical data.基于轮廓似然的置信区间与其他基于秩的方法在有序分类数据两样本问题中的比较。
J Biopharm Stat. 2023 May 4;33(3):371-385. doi: 10.1080/10543406.2022.2152831. Epub 2022 Dec 19.

引用本文的文献

1
Habitats, Plant Diversity, Morphology, Anatomy, and Molecular Phylogeny of (Popov) Akhani & Roalson.(波波夫)阿卡尼和罗尔森的栖息地、植物多样性、形态学、解剖学及分子系统发育
Plants (Basel). 2025 Jul 24;14(15):2279. doi: 10.3390/plants14152279.
2
Auxiliary Value of [F]F-Fluorocholine PET/CT in Evaluating Post-Stereotactic Radiosurgery Recurrence of Lung Cancer Brain Metastases: A Comparative Analysis with Contrast-Enhanced MRI.[F]F-氟胆碱PET/CT在评估肺癌脑转移立体定向放射治疗后复发中的辅助价值:与对比增强MRI的对比分析
Cancers (Basel). 2025 Aug 7;17(15):2591. doi: 10.3390/cancers17152591.
3
The Prognostic Value of Hematological, Immune-Inflammatory, Metabolic, and Hormonal Biomarkers in the Treatment Response of Hospitalized Patients with Anorexia Nervosa.

本文引用的文献

1
Abciximab and heparin versus bivalirudin for non-ST-elevation myocardial infarction.阿昔单抗和肝素与比伐卢定用于非 ST 段抬高型心肌梗死。
N Engl J Med. 2011 Nov 24;365(21):1980-9. doi: 10.1056/NEJMoa1109596. Epub 2011 Nov 13.
2
Integration of antiretroviral therapy with tuberculosis treatment.抗逆转录病毒疗法与结核病治疗的整合。
N Engl J Med. 2011 Oct 20;365(16):1492-501. doi: 10.1056/NEJMoa1014181.
3
Prevalence and outcomes of same-day discharge after elective percutaneous coronary intervention among older patients.择期经皮冠状动脉介入治疗后老年患者当日出院的发生率和结局。
血液学、免疫炎症、代谢和激素生物标志物在神经性厌食症住院患者治疗反应中的预后价值
Nutrients. 2025 Jul 9;17(14):2260. doi: 10.3390/nu17142260.
4
The universal interprofessional education Q tool (U-IPEQ) for student learning- a pilot trial in the human anatomical dissection space.用于学生学习的通用跨专业教育Q工具(U-IPEQ)——人体解剖学解剖空间中的一项试点试验。
BMC Med Educ. 2025 Jul 1;25(1):915. doi: 10.1186/s12909-025-07440-z.
5
The Behaviour of Contaflex Soft Contact Lens Material During Hydration.Contaflex软性隐形眼镜材料在水化过程中的行为。
Gels. 2025 May 21;11(5):376. doi: 10.3390/gels11050376.
6
Deep Learning-based Time-to-event Analysis of Depression and Asthma using the All of Us Research Program.利用全民研究计划基于深度学习的抑郁症和哮喘事件发生时间分析
AMIA Annu Symp Proc. 2025 May 22;2024:1186-1195. eCollection 2024.
7
Construction of a predictive model for relapse of primary autoimmune hemolytic anemia: a retrospective cohort study.原发性自身免疫性溶血性贫血复发预测模型的构建:一项回顾性队列研究。
Ann Med. 2025 Dec;57(1):2506482. doi: 10.1080/07853890.2025.2506482. Epub 2025 May 22.
8
Impact of Irradiated Pupae on the Quality and Population Parameters of .辐照蛹对……的质量和种群参数的影响
Insects. 2025 Apr 2;16(4):379. doi: 10.3390/insects16040379.
9
Understanding the experiences of nursing students in the context of telenursing in Saudi Arabia: a cross-sectional study.了解沙特阿拉伯远程护理背景下护理专业学生的经历:一项横断面研究。
BMC Med Educ. 2025 May 5;25(1):650. doi: 10.1186/s12909-025-07263-y.
10
Association of a Combined Body Mass Index and Regional Body Fat Percentage Metric With Fragility Fracture Risk: Evidence from a Large Observational Cohort.联合体重指数与局部体脂百分比指标与脆性骨折风险的关联:来自大型观察性队列的证据
J Cachexia Sarcopenia Muscle. 2025 Apr;16(2):e13808. doi: 10.1002/jcsm.13808.
JAMA. 2011 Oct 5;306(13):1461-7. doi: 10.1001/jama.2011.1409.
4
Sex-specific prevalence of adenomas, advanced adenomas, and colorectal cancer in individuals undergoing screening colonoscopy.在接受筛查性结肠镜检查的个体中,腺瘤、高级别腺瘤和结直肠癌的性别特异性患病率。
JAMA. 2011 Sep 28;306(12):1352-8. doi: 10.1001/jama.2011.1362.
5
High residual platelet reactivity after clopidogrel loading and long-term cardiovascular events among patients with acute coronary syndromes undergoing PCI.急性冠脉综合征经皮冠状动脉介入治疗患者氯吡格雷负荷后血小板高反应性与长期心血管事件的关系。
JAMA. 2011 Sep 21;306(11):1215-23. doi: 10.1001/jama.2011.1332.
6
Xanthelasmata, arcus corneae, and ischaemic vascular disease and death in general population: prospective cohort study.黄色瘤、角膜弓、缺血性血管疾病与普通人群的死亡:前瞻性队列研究。
BMJ. 2011 Sep 15;343:d5497. doi: 10.1136/bmj.d5497.
7
The tyranny of power: is there a better way to calculate sample size?权力的专制:是否有更好的方法来计算样本量?
BMJ. 2009 Oct 6;339:b3985. doi: 10.1136/bmj.b3985.
8
A note on the use of the non-parametric Wilcoxon-Mann-Whitney test in the analysis of medical studies.关于在医学研究分析中使用非参数威尔科克森-曼-惠特尼检验的说明。
Ger Med Sci. 2008 Apr 7;6:Doc02.
9
Performance of five two-sample location tests for skewed distributions with unequal variances.五种用于具有不等方差的偏态分布的两样本位置检验的性能。
Contemp Clin Trials. 2009 Sep;30(5):490-6. doi: 10.1016/j.cct.2009.06.007. Epub 2009 Jul 2.
10
The Wilcoxon-Mann-Whitney test under scrutiny.接受审视的威尔科克森-曼-惠特尼检验
Stat Med. 2009 May 1;28(10):1487-97. doi: 10.1002/sim.3561.