Williams J L, Hathaway C A, Kloster K L, Layne B H
Department of Physiology and Pharmacology, School of Medicine, University of South Dakota, Vermillion 57069, USA.
Am J Physiol. 1997 Jul;273(1 Pt 2):H487-93. doi: 10.1152/ajpheart.1997.273.1.H487.
Frequently in biomedical literature, measurements are considered "not statistically different" if a statistical test fails to achieve a P value that is < or = 0.05. This conclusion may be misleading because the size of each group is too small or the variability is large, and a type II error (false negative) is committed. In this study, we examined the probabilities of detecting a real difference (power) and type II errors in unpaired t-tests in Volumes 246 and 266 of the American Journal of Physiology: Heart and Circulatory Physiology. In addition, we examined all articles for other statistical errors. The median power of the t-tests was similar in these volumes (approximately 0.55 and approximately 0.92 to detect a 20% and a 50% change, respectively). In both volumes, approximately 80% of the studies with nonsignificant unpaired t-tests contained at least one t-test with a type II error probability > 0.30. Our findings suggest that low power and a high incidence of type II errors are common problems in this journal. In addition, the presentation of statistics was often vague, t-tests were misused frequently, and assumptions for inferential statistics usually were not mentioned or examined.
在生物医学文献中,如果统计检验未能得出小于或等于0.05的P值,测量结果通常会被认为“无统计学差异”。这一结论可能具有误导性,因为每组样本量过小或变异性过大,从而犯了II类错误(假阴性)。在本研究中,我们在美国《生理学杂志:心脏与循环生理学》第246卷和第266卷中,检验了非配对t检验中检测真实差异的概率(检验效能)和II类错误。此外,我们检查了所有文章是否存在其他统计错误。这些卷中t检验的检验效能中位数相似(分别检测20%和50%的变化时,检验效能约为0.55和约为0.92)。在这两卷中,约80%未得出显著结果的非配对t检验研究中,至少有一项t检验的II类错误概率>0.30。我们的研究结果表明,检验效能低和II类错误发生率高是该期刊中常见的问题。此外,统计数据的呈现往往模糊不清,t检验经常被误用,并且通常未提及或检验推断性统计的假设。