• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

统计软件中研究者的自由度会导致结果不可靠:对在SPSS、SAS、Stata和R中进行的非参数分析的比较。

Researcher degrees of freedom in statistical software contribute to unreliable results: A comparison of nonparametric analyses conducted in SPSS, SAS, Stata, and R.

作者信息

Hodges Cooper B, Stone Bryant M, Johnson Paula K, Carter James H, Sawyers Chelsea K, Roby Patricia R, Lindsey Hannah M

机构信息

Department of Neurology, University of Utah School of Medicine, Salt Lake City, UT, USA.

Department of Psychology, Brigham Young University, Provo, UT, USA.

出版信息

Behav Res Methods. 2023 Sep;55(6):2813-2837. doi: 10.3758/s13428-022-01932-2. Epub 2022 Aug 11.

DOI:10.3758/s13428-022-01932-2
PMID:35953660
Abstract

Researcher degrees of freedom can affect the results of hypothesis tests and consequently, the conclusions drawn from the data. Previous research has documented variability in accuracy, speed, and documentation of output across various statistical software packages. In the current investigation, we conducted Pearson's chi-square test of independence, Spearman's rank-ordered correlation, Kruskal-Wallis one-way analysis of variance, Wilcoxon Mann-Whitney U rank-sum tests, and Wilcoxon signed-rank tests, along with estimates of skewness and kurtosis, on large, medium, and small samples of real and simulated data in SPSS, SAS, Stata, and R and compared the results with those obtained through hand calculation using the raw computational formulas. Multiple inconsistencies were found in the results produced between statistical packages due to algorithmic variation, computational error, and statistical output. The most notable inconsistencies were due to algorithmic variations in the computation of Pearson's chi-square test conducted on 2 × 2 tables, where differences in p-values reported by different software packages ranged from .005 to .162, largely as a function of sample size. We discuss how such inconsistencies may influence the conclusions drawn from the results of statistical analyses depending on the statistical software used, and we urge researchers to analyze their data across multiple packages to check for inconsistencies and report details regarding the statistical procedure used for data analysis.

摘要

研究者自由度会影响假设检验的结果,进而影响从数据得出的结论。先前的研究记录了不同统计软件包在准确性、速度和输出记录方面的差异。在当前的调查中,我们在SPSS、SAS、Stata和R软件中,对大、中、小样本的真实数据和模拟数据进行了Pearson卡方独立性检验、Spearman等级相关分析、Kruskal-Wallis单因素方差分析、Wilcoxon Mann-Whitney U秩和检验以及Wilcoxon符号秩检验,并对偏度和峰度进行了估计,然后将结果与使用原始计算公式手工计算得到的结果进行比较。由于算法差异、计算误差和统计输出,在不同统计软件包产生的结果中发现了多个不一致之处。最显著的不一致之处在于对2×2列联表进行Pearson卡方检验时的算法差异,不同软件包报告的p值差异范围从0.005到0.162,这在很大程度上取决于样本量。我们讨论了这些不一致之处如何根据所使用的统计软件影响从统计分析结果中得出的结论,并敦促研究人员在多个软件包中分析他们的数据,以检查是否存在不一致之处,并报告用于数据分析的统计程序的详细信息。

相似文献

1
Researcher degrees of freedom in statistical software contribute to unreliable results: A comparison of nonparametric analyses conducted in SPSS, SAS, Stata, and R.统计软件中研究者的自由度会导致结果不可靠:对在SPSS、SAS、Stata和R中进行的非参数分析的比较。
Behav Res Methods. 2023 Sep;55(6):2813-2837. doi: 10.3758/s13428-022-01932-2. Epub 2022 Aug 11.
2
2 × 2 Tables: a note on Campbell's recommendation.2×2表格:关于坎贝尔建议的一则注释
Stat Med. 2016 Apr 15;35(8):1354-8. doi: 10.1002/sim.6808. Epub 2015 Nov 17.
3
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
4
Exact and Monte carlo resampling procedures for the Wilcoxon-Mann-Whitney and Kruskal-Wallis tests.用于 Wilcoxon-Mann-Whitney 检验和 Kruskal-Wallis 检验的精确重采样和蒙特卡罗重采样程序。
Percept Mot Skills. 2000 Dec;91(3 Pt 1):749-54. doi: 10.2466/pms.2000.91.3.749.
5
A myriad of methods: calculated sample size for two proportions was dependent on the choice of sample size formula and software.多种方法:两比例的样本量计算取决于样本量公式和软件的选择。
J Clin Epidemiol. 2014 May;67(5):601-5. doi: 10.1016/j.jclinepi.2013.10.008. Epub 2014 Jan 16.
6
Statistical grand rounds: a review of analysis and sample size calculation considerations for Wilcoxon tests.统计大查房:Wilcoxon 检验分析和样本量计算考虑因素回顾。
Anesth Analg. 2013 Sep;117(3):699-710. doi: 10.1213/ANE.0b013e31827f53d7. Epub 2013 Mar 1.
7
Introduction to biostatistics: Part 5, Statistical inference techniques for hypothesis testing with nonparametric data.
Ann Emerg Med. 1990 Sep;19(9):1054-9. doi: 10.1016/s0196-0644(05)82571-5.
8
Comparing multiple statistical software for multiple-indicator, multiple-cause modeling: an application of gender disparity in adult cognitive functioning using MIDUS II dataset.比较多种统计软件在多指标、多原因模型中的应用:以 MIDUS II 数据集为例,分析成年认知功能的性别差异。
BMC Med Res Methodol. 2020 Nov 12;20(1):275. doi: 10.1186/s12874-020-01150-4.
9
Analysis of 2 x 2 tables of frequencies: matching test to experimental design.频率的2×2表格分析:匹配检验与实验设计
Int J Epidemiol. 2008 Dec;37(6):1430-5. doi: 10.1093/ije/dyn162. Epub 2008 Aug 18.
10
Power and sample size evaluation for the Cochran-Mantel-Haenszel mean score (Wilcoxon rank sum) test and the Cochran-Armitage test for trend. Cochran-Mantel-Haenszel 平均评分(Wilcoxon 秩和)检验和 Cochran-Armitage 趋势检验的功效和样本量估计
Stat Med. 2011 Nov 10;30(25):3057-66. doi: 10.1002/sim.4330. Epub 2011 Aug 25.

引用本文的文献

1
Artificial liver classifier: a new alternative to conventional machine learning models.人工肝脏分类器:传统机器学习模型的新替代方案。
Front Artif Intell. 2025 Aug 11;8:1639720. doi: 10.3389/frai.2025.1639720. eCollection 2025.
2
Changes in Soil Microbial Community Structure and Assembly Process Under Different Forest Restoration Strategies in Cold Temperate Forests of Northeastern China.中国东北寒温带森林不同森林恢复策略下土壤微生物群落结构及组装过程的变化
Microorganisms. 2025 Jun 9;13(6):1339. doi: 10.3390/microorganisms13061339.
3
Emotion dysregulation in youths with obsessive-compulsive disorder and its implication for treatment - An exploratory study from the TECTO trial: A protocol and statistical analysis plan.

本文引用的文献

1
A manifesto for reproducible science.可重复科学宣言。
Nat Hum Behav. 2017 Jan 10;1(1):0021. doi: 10.1038/s41562-016-0021.
2
Asymptotic versus exact methods in the analysis of contingency tables: Evidence-based practical recommendations.列联表分析中的渐近方法与精确方法:基于证据的实用建议。
Stat Methods Med Res. 2020 Sep;29(9):2569-2582. doi: 10.1177/0962280220902480. Epub 2020 Feb 5.
3
Two-tailed significance tests for 2 × 2 contingency tables: What is the alternative?2×2 列联表的双侧显著性检验:备择假设是什么?
强迫症青少年的情绪调节障碍及其对治疗的意义——TECTO试验的一项探索性研究:方案与统计分析计划
Contemp Clin Trials Commun. 2024 Dec 4;43:101408. doi: 10.1016/j.conctc.2024.101408. eCollection 2025 Feb.
4
Simulation studies for methodological research in psychology: A standardized template for planning, preregistration, and reporting.心理学方法学研究的模拟研究:规划、预注册和报告的标准化模板
Psychol Methods. 2024 Nov 14. doi: 10.1037/met0000695.
5
Evaluating ChatGPT-4.0's data analytic proficiency in epidemiological studies: A comparative analysis with SAS, SPSS, and R.评估 ChatGPT-4.0 在流行病学研究中的数据分析能力:与 SAS、SPSS 和 R 的对比分析。
J Glob Health. 2024 Mar 29;14:04070. doi: 10.7189/jogh.14.04070.
6
Influence of endotracheal tube and laryngeal mask airway for general anesthesia on perioperative adverse events in patients undergoing laparoscopic hysterectomy: A propensity score-matched analysis.气管插管和喉罩气道用于全身麻醉对腹腔镜子宫切除术患者围手术期不良事件的影响:一项倾向评分匹配分析。
J Res Med Sci. 2024 Feb 23;28:88. doi: 10.4103/jrms.jrms_384_22. eCollection 2023.
7
Stigma and public attitudes toward euthanasia or assisted suicide for psychiatric conditions: results from a general population survey in Germany.对精神疾病患者实施安乐死或协助自杀的污名化及公众态度:德国一项普通人群调查的结果
BJPsych Open. 2024 Feb 8;10(2):e44. doi: 10.1192/bjo.2024.4.
8
Algorithmic jingle jungle: A comparison of implementations of principal axis factoring and promax rotation in R and SPSS.算法叮当声丛林:R 和 SPSS 中主成分分析和 promax 旋转实现的比较。
Behav Res Methods. 2022 Feb;54(1):54-74. doi: 10.3758/s13428-021-01581-x. Epub 2021 Jun 7.
Stat Med. 2019 Sep 30;38(22):4264-4269. doi: 10.1002/sim.8294. Epub 2019 Jul 1.
4
Consistency errors in p-values reported in Spanish psychology journals.西班牙心理学期刊中 p 值报告的一致性错误。
Psicothema. 2013;25(3):408-14. doi: 10.7334/psicothema2012.207.
5
Statistical conclusion validity: some common threats and simple remedies.统计结论效度:一些常见威胁及简单补救措施。
Front Psychol. 2012 Aug 29;3:325. doi: 10.3389/fpsyg.2012.00325. eCollection 2012.
6
Are assumptions of well-known statistical techniques checked, and why (not)?是否检查了知名统计技术的假设,以及为什么(不)?
Front Psychol. 2012 May 14;3:137. doi: 10.3389/fpsyg.2012.00137. eCollection 2012.
7
False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant.虚假阳性心理学:在数据收集和分析中不披露的灵活性使得任何事物都可以被呈现为显著的。
Psychol Sci. 2011 Nov;22(11):1359-66. doi: 10.1177/0956797611417632. Epub 2011 Oct 17.
8
The choice of statistical tests illustrated on the interpretation of data classed in a 2 X 2 table.统计检验的选择在2×2表格分类数据的解释中得到说明。
Biometrika. 1947;34(1-2):139-69. doi: 10.1093/biomet/34.1-2.139.
9
Note on the sampling error of the difference between correlated proportions or percentages.关于相关比例或百分比差异的抽样误差说明。
Psychometrika. 1947 Jun;12(2):153-7. doi: 10.1007/BF02295996.
10
Recommended tests for association in 2 x 2 tables.2×2列联表中关联性的推荐检验方法。
Stat Med. 2009 Mar 30;28(7):1159-75. doi: 10.1002/sim.3531.