排列检验在0.5%和5%的显著性水平下既稳健又有效。

Permutation tests are robust and powerful at 0.5% and 5% significance levels.

作者信息

Noguchi Kimihiro, Konietschke Frank, Marmolejo-Ramos Fernando, Pauly Markus

机构信息

Department of Mathematics, Western Washington University, Bellingham, WA, 98225, USA.

Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Institute of Biometry and Clinical Epidemiology, Charitéplatz 1, Berlin, 10117, Germany.

出版信息

Behav Res Methods. 2021 Dec;53(6):2712-2724. doi: 10.3758/s13428-021-01595-5. Epub 2021 May 28.

DOI:10.3758/s13428-021-01595-5

PMID:34050436

Abstract

Recent replication crisis has led to a number of ad hoc suggestions to decrease the chance of making false positive findings. Among them, Johnson (Proceedings of the National Academy of Sciences, 110, 19313-19317, 2013) and Benjamin et al. (Nature Human Behaviour, 2, 6-10 2018) recommend using the significance level of α = 0.005 (0.5%) as opposed to the conventional 0.05 (5%) level. Even though their suggestion is easy to implement, it is unclear whether or not the commonly used statistical tests are robust and/or powerful at such a small significance level. Therefore, the main aim of our study is to investigate the robustness and power curve behaviors of independent (unpaired) two-sample tests for metric and ordinal data at nominal significance levels of α = 0.005 and α = 0.05. Through an extensive simulation study, it is found that the permutation versions of the Welch t-test and the Brunner-Munzel test are particularly robust and powerful while the commonly used two-sample tests which utilize t-distribution tend to be either liberal or conservative, and have peculiar power curve behaviors under skewed distributions with variance heterogeneity.

摘要

最近的复制危机引发了一些临时建议，以降低得出假阳性结果的可能性。其中，约翰逊（《美国国家科学院院刊》，110卷，19313 - 19317页，2013年）以及本杰明等人（《自然·人类行为》，第2卷，6 - 10页，2018年）建议使用α = 0.005（0.5%）的显著性水平，而非传统的0.05（5%）水平。尽管他们的建议易于实施，但尚不清楚常用的统计检验在如此小的显著性水平下是否稳健且/或有效。因此，我们研究的主要目的是在α = 0.005和α = 0.05的名义显著性水平下，研究度量和有序数据的独立（非配对）双样本检验的稳健性和功效曲线行为。通过广泛的模拟研究发现，韦尔奇t检验和布鲁纳 - 蒙泽尔检验的排列版本特别稳健且有效，而常用的利用t分布的双样本检验往往要么宽松要么保守，并且在具有方差不齐性的偏态分布下具有特殊的功效曲线行为。

相似文献

Permutation tests are robust and powerful at 0.5% and 5% significance levels.排列检验在0.5%和5%的显著性水平下既稳健又有效。

Behav Res Methods. 2021 Dec;53(6):2712-2724. doi: 10.3758/s13428-021-01595-5. Epub 2021 May 28.

Performance of five two-sample location tests for skewed distributions with unequal variances.五种用于具有不等方差的偏态分布的两样本位置检验的性能。

Contemp Clin Trials. 2009 Sep;30(5):490-6. doi: 10.1016/j.cct.2009.06.007. Epub 2009 Jul 2.

Comparison of profile-likelihood-based confidence intervals with other rank-based methods for the two-sample problem in ordered categorical data.基于轮廓似然的置信区间与其他基于秩的方法在有序分类数据两样本问题中的比较。

J Biopharm Stat. 2023 May 4;33(3):371-385. doi: 10.1080/10543406.2022.2152831. Epub 2022 Dec 19.

Analysis of small sample size studies using nonparametric bootstrap test with pooled resampling method.使用合并重采样方法的非参数自助检验对小样本量研究进行分析。

Stat Med. 2017 Jun 30;36(14):2187-2205. doi: 10.1002/sim.7263. Epub 2017 Mar 9.

Permutation-based inference for the AUC: A unified approach for continuous and discontinuous data.基于排列的AUC推断：连续和不连续数据的统一方法。

Biom J. 2016 Nov;58(6):1319-1337. doi: 10.1002/bimj.201500105. Epub 2016 Aug 9.

Location tests for biomarker studies: a comparison using simulations for the two-sample case.生物标志物研究的定位测试：两样本情形下的模拟比较

Methods Inf Med. 2013;52(4):351-9. doi: 10.3414/ME12-02-0014. Epub 2013 Jul 23.

The Wilcoxon-Mann-Whitney test under scrutiny.接受审视的威尔科克森-曼-惠特尼检验

Stat Med. 2009 May 1;28(10):1487-97. doi: 10.1002/sim.3561.

Robust multivariate nonparametric tests for detection of two-sample location shift in clinical trials.稳健的多元非参数检验方法在临床试验中用于检测两样本位置偏移。

PLoS One. 2018 Apr 19;13(4):e0195894. doi: 10.1371/journal.pone.0195894. eCollection 2018.

Estimating p-values in small microarray experiments.在小型微阵列实验中估计p值。

Bioinformatics. 2007 Jan 1;23(1):38-43. doi: 10.1093/bioinformatics/btl548. Epub 2006 Oct 30.

Implementing continuous non-normal skewed distributions in latent growth mixture modeling: An assessment of specification errors and class enumeration.在潜在增长混合建模中实现连续非正态偏态分布：对规格错误和类别枚举的评估。

Multivariate Behav Res. 2019 Nov-Dec;54(6):795-821. doi: 10.1080/00273171.2019.1593813. Epub 2019 Apr 23.

引用本文的文献

Layer-specific changes in sensory cortex across the lifespan in mice and humans.小鼠和人类一生中感觉皮层的层特异性变化。

Nat Neurosci. 2025 Aug 11. doi: 10.1038/s41593-025-02013-1.

Dose-response mapping of bladder and rectum in prostate cancer patients undergoing radiotherapy with and without baseline toxicity correction.在接受放疗的前列腺癌患者中，膀胱和直肠的剂量-反应映射，有无基线毒性校正。

Phys Imaging Radiat Oncol. 2025 Jul 1;35:100805. doi: 10.1016/j.phro.2025.100805. eCollection 2025 Jul.

Understanding 30-Day Mortality After First STEMI Through DAGs: Unravelling Epidemiological Cause-Effect Links.通过有向无环图理解首次ST段抬高型心肌梗死后30天死亡率：揭示流行病学因果关系

Cureus. 2025 Jun 16;17(6):e86178. doi: 10.7759/cureus.86178. eCollection 2025 Jun.

Neuronal number and somal volume in calbindin-expressing neurons of the marmoset dorsal lateral geniculate nucleus are preserved during aging.狨猴背外侧膝状核中表达钙结合蛋白的神经元数量和胞体体积在衰老过程中保持不变。

PLoS One. 2025 May 23;20(5):e0323906. doi: 10.1371/journal.pone.0323906. eCollection 2025.

Rising Water Levels and Vegetation Shifts Drive Substantial Reductions in Methane Emissions and Carbon Dioxide Uptake in a Great Lakes Coastal Freshwater Wetland.水位上升和植被变化导致五大湖沿岸淡水湿地的甲烷排放量大幅减少和二氧化碳吸收量下降。

Glob Chang Biol. 2025 Feb;31(2):e70053. doi: 10.1111/gcb.70053.

Brain Commun. 2024 Sep 19;6(5):fcae321. doi: 10.1093/braincomms/fcae321. eCollection 2024.

RAMZIS: a bioinformatic toolkit for rigorous assessment of the alterations to glycoprotein composition that occur during biological processes.RAMZIS：一种生物信息学工具包，用于严格评估生物过程中发生的糖蛋白组成变化。

Bioinform Adv. 2024 Jan 25;4(1):vbae012. doi: 10.1093/bioadv/vbae012. eCollection 2024.

PERMUTOOLS: A MATLAB PACKAGE FOR MULTIVARIATE PERMUTATION TESTING.排列工具：用于多变量排列检验的MATLAB软件包。

ArXiv. 2024 Jan 17:arXiv:2401.09401v1.

Advice on comparing two independent samples of circular data in biology.生物学中比较两组独立的圆形数据的建议。

Sci Rep. 2021 Oct 13;11(1):20337. doi: 10.1038/s41598-021-99299-5.

本文引用的文献

Moving beyond P values: data analysis with estimation graphics.超越P值：使用估计图进行数据分析。

Nat Methods. 2019 Jul;16(7):565-566. doi: 10.1038/s41592-019-0470-3.

Redefine statistical significance.重新定义统计学显著性。

Nat Hum Behav. 2018 Jan;2(1):6-10. doi: 10.1038/s41562-017-0189-z.

Using simulation studies to evaluate statistical methods.运用模拟研究评估统计方法。

Stat Med. 2019 May 20;38(11):2074-2102. doi: 10.1002/sim.8086. Epub 2019 Jan 16.

Perspectives on the Use of Null Hypothesis Statistical Testing. Part II: Is Null Hypothesis Statistical Testing an Irregular Bulk of Masonry?关于零假设统计检验应用的观点。第二部分：零假设统计检验是一堆不规则的砖石建筑吗？

Educ Psychol Meas. 2017 Aug;77(4):613-615. doi: 10.1177/0013164416667987. Epub 2016 Oct 5.

Perspectives on the Use of Null Hypothesis Statistical Testing. Part III: The Various Nuts and Bolts of Statistical and Hypothesis Testing.关于零假设统计检验应用的观点。第三部分：统计检验与假设检验的各项具体细节。

Educ Psychol Meas. 2017 Oct;77(5):816-818. doi: 10.1177/0013164416667988. Epub 2016 Oct 6.

Perspectives on the Use of Null Hypothesis Statistical Testing. Part I: The Mighty Frames of Scientific and Statistical Inference.关于零假设统计检验应用的观点。第一部分：科学与统计推断的强大框架。

Educ Psychol Meas. 2017 Jun;77(3):471-474. doi: 10.1177/0013164416667986. Epub 2016 Oct 6.

Four simple ways to increase power without increasing the sample size.在不增加样本量的情况下提高检验效能的四种简单方法。

Lab Anim. 2018 Dec;52(6):621-629. doi: 10.1177/0023677218767478. Epub 2018 Apr 8.

Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature.对近期认知神经科学和心理学文献中已发表的效应量和检验效能的实证评估。

PLoS Biol. 2017 Mar 2;15(3):e2000797. doi: 10.1371/journal.pbio.2000797. eCollection 2017 Mar.

Permutation-based inference for the AUC: A unified approach for continuous and discontinuous data.基于排列的AUC推断：连续和不连续数据的统一方法。

Biom J. 2016 Nov;58(6):1319-1337. doi: 10.1002/bimj.201500105. Epub 2016 Aug 9.

1,500 scientists lift the lid on reproducibility.1500名科学家揭开了可重复性的盖子。

Nature. 2016 May 26;533(7604):452-4. doi: 10.1038/533452a.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

排列检验在0.5%和5%的显著性水平下既稳健又有效。

Permutation tests are robust and powerful at 0.5% and 5% significance levels.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献