Chicco Davide, Sichenze Andrea, Jurman Giuseppe
Università di Milano-Bicocca, Milan, Italy.
University of Toronto, Toronto, Ontario, Canada.
BioData Min. 2025 Aug 20;18(1):56. doi: 10.1186/s13040-025-00465-6.
In an age when machine learning and artificial intelligence are broadly employed, traditional statistics can still provide insightful information and results quickly and at a low computational cost. Statistics, in fact, offers many useful tools to researchers, including a series of univariate statistical tests that can identify relationships between pairs of numeric samples: Student's t-test, Mann-Whitney U test, Chi-squared test, and Kruskal-Wallis test. These tests generate several outcomes, including probability values (p-values) that can express a numerical quantity which accepts or rejects the null hypothesis, based on a certain threshold used. Although effective, these tests are often misused or employed in the wrong contexts, especially among biostatistics studies. Many scientific researchers do not seem to know how to choose one test over the others, and this misuse can lead to incorrect results and wrong conclusions. Here we present a simple theoretical and practical guide to the use of these four tests, first describing their theoretical properties and then displaying the results obtained by applying these tests to real-world medical datasets. Eventually, we explain when and how to use each test based on the data types of the samples considered. Our study can have a strong impact on scientific research by potentially influencing future studies involving these tests. Our recommendations, in turn, can help researchers produce more reliable and sound scientific results, thus increasing the quality of multiple scientific studies across various fields.
在机器学习和人工智能被广泛应用的时代,传统统计学仍然能够快速且以较低的计算成本提供有深刻见解的信息和结果。事实上,统计学为研究人员提供了许多有用的工具,包括一系列单变量统计检验,这些检验可以识别数值样本对之间的关系:学生t检验、曼-惠特尼U检验、卡方检验和克鲁斯卡尔-沃利斯检验。这些检验会产生多种结果,包括概率值(p值),根据所使用的特定阈值,这些概率值可以表示接受或拒绝原假设的数值量。尽管这些检验很有效,但它们经常被滥用或在错误的背景下使用,尤其是在生物统计学研究中。许多科研人员似乎不知道如何在这些检验中进行选择,而这种误用可能导致错误的结果和结论。在这里,我们提供一个关于使用这四种检验的简单理论和实践指南,首先描述它们的理论特性,然后展示将这些检验应用于实际医学数据集所获得的结果。最后,我们根据所考虑样本的数据类型解释何时以及如何使用每种检验。我们的研究可能会对科学研究产生重大影响,潜在地影响涉及这些检验的未来研究。反过来,我们的建议可以帮助研究人员产生更可靠和合理的科学结果,从而提高各个领域多项科学研究的质量。