Feinstein A R
Yale University School of Medicine, New Haven, Connecticut 06510, USA.
J Clin Epidemiol. 1998 Apr;51(4):355-60. doi: 10.1016/s0895-4356(97)00295-3.
For both P-values and confidence intervals, an alpha level is chosen to set limits of acceptable probability for the role of chance in the observed distinctions. The level of alpha is used either for direct comparison with a single P-value, or for determining the extent of a confidence interval. "Statistical significance" is proclaimed if the calculations yield a P-value that is below alpha, or a 1-alpha confidence interval whose range excludes the null result of "no difference." Both the P-value and confidence-interval methods are essentially reciprocal, since they use the same principles of probabilistic calculation; and both can yield distorted or misleading results if the data do not adequately conform to the underlying mathematical requirements. The major scientific disadvantage of both methods is that their "significance" is merely an inference derived from principles of mathematical probability, not an evaluation of substantive importance for the "big" or "small" magnitude of the observed distinction. The latter evaluation has not received adequate attention during the emphasis on probabilistic decisions; and careful principles have not been developed either for the substantive reasoning or for setting appropriate boundaries for "big" or "small." After a century of "significance" inferred exclusively from probabilities, a basic scientific challenge is to develop methods for deciding what is substantively impressive or trivial.
对于P值和置信区间,都要选择一个α水平来设定在观察到的差异中,机遇所起作用的可接受概率的界限。α水平用于与单个P值直接比较,或用于确定置信区间的范围。如果计算得出的P值低于α水平,或者1-α置信区间的范围不包括“无差异”的无效结果,则宣称具有“统计学显著性”。P值法和置信区间法本质上是相互对应的,因为它们都使用相同的概率计算原理;而且,如果数据不能充分符合潜在的数学要求,两者都可能产生扭曲或误导性的结果。这两种方法在科学上的主要缺点是,它们的“显著性”仅仅是从数学概率原理得出的一种推断,而不是对观察到的差异的“大”或“小”程度的实质重要性的评估。在强调概率性决策的过程中,对后者的评估没有得到足够的重视;而且,对于实质推理或设定“大”或“小”的适当界限,也没有制定出谨慎的原则。在仅仅从概率推断“显著性”一个世纪之后,一个基本的科学挑战是开发方法来决定什么是实质上令人印象深刻的或微不足道的。