统计显著性概述：基本原理、有效性和实用性。

Précis of statistical significance: rationale, validity, and utility.

作者信息

Chow S L

机构信息

Department of Psychology, University of Regina, Saskatchewan, Canada.

出版信息

Behav Brain Sci. 1998 Apr;21(2):169-94; discussion 194-239. doi: 10.1017/s0140525x98001162.

DOI:10.1017/s0140525x98001162

PMID:10097013

Abstract

The null-hypothesis significance-test procedure (NHSTP) is defended in the context of the theory-corroboration experiment, as well as the following contrasts: (a) substantive hypotheses versus statistical hypotheses, (b) theory corroboration versus statistical hypothesis testing, (c) theoretical inference versus statistical decision, (d) experiments versus nonexperimental studies, and (e) theory corroboration versus treatment assessment. The null hypothesis can be true because it is the hypothesis that errors are randomly distributed in data. Moreover, the null hypothesis is never used as a categorical proposition. Statistical significance means only that chance influences can be excluded as an explanation of data; it does not identify the nonchance factor responsible. The experimental conclusion is drawn with the inductive principle underlying the experimental design. A chain of deductive arguments gives rise to the theoretical conclusion via the experimental conclusion. The anomalous relationship between statistical significance and the effect size often used to criticize NHSTP is more apparent than real. The absolute size of the effect is not an index of evidential support for the substantive hypothesis. Nor is the effect size, by itself, informative as to the practical importance of the research result. Being a conditional probability, statistical power cannot be the a priori probability of statistical significance. The validity of statistical power is debatable because statistical significance is determined with a single sampling distribution of the test statistic based on H0, whereas it takes two distributions to represent statistical power or effect size. Sample size should not be determined in the mechanical manner envisaged in power analysis. It is inappropriate to criticize NHSTP for nonstatistical reasons. At the same time, neither effect size, nor confidence interval estimate, nor posterior probability can be used to exclude chance as an explanation of data. Neither can any of them fulfill the nonstatistical functions expected of them by critics.

摘要

在理论确证实验的背景下，以及在以下对比中对零假设显著性检验程序（NHSTP）进行了辩护：（a）实质性假设与统计假设，（b）理论确证与统计假设检验，（c）理论推断与统计决策，（d）实验研究与非实验研究，以及（e）理论确证与治疗评估。零假设可能是正确的，因为它是假设误差在数据中随机分布。此外，零假设从未被用作一个绝对命题。统计显著性仅意味着可以排除偶然因素作为数据的一种解释；它并未识别出造成这种情况的非偶然因素。实验结论是根据实验设计所依据的归纳原则得出的。一系列演绎论证通过实验结论得出理论结论。常用于批评NHSTP的统计显著性与效应大小之间的反常关系，实际上比表面上看起来更为明显。效应的绝对大小并非对实质性假设的证据支持的指标。效应大小本身对于研究结果的实际重要性也并无信息价值。作为一个条件概率，统计功效不可能是统计显著性的先验概率。统计功效的有效性存在争议，因为统计显著性是基于原假设（H0）用检验统计量的单一抽样分布来确定的，而表示统计功效或效应大小则需要两个分布。样本量不应按照功效分析中设想的机械方式来确定。出于非统计原因批评NHSTP是不合适的。同时，效应大小、置信区间估计或后验概率都不能用来排除偶然因素作为数据的一种解释。它们也都无法履行批评者期望它们具备的非统计功能。