Verdoux H, Salamon R
Service Universitaire de Psychiatrie du Pr Bourgeois, Université de Bordeaux II, Centre Carreire.
Encephale. 1997 Jan-Feb;23(1):19-21.
Hypothesis testing and significance is currently the most widely used method in the medical literature to report statistical results. However, this method has several limitations. The main one is linked to the risk of misinterpretation of the p value. The arbitrariness of the 5 percent value used to determine whether a result is or not statistically significant is not always kept in mind, and the concept of statistical significance might therefore be confused with that of clinical or biological relevance. The misinterpretation pitfalls are mostly linked to the fact that the p value does not give precise indications on the strength of the association and its direction, or on the variability in the sample. Therefore, some experts claim that hypothesis testing and significance should be avoided in reporting statistical results, and that the method based upon estimation and confidence interval should be more widely used. By this latter method, it is possible to know the direction of the association and the effect size (i.e. the strength of the association). The precision of the estimation, i.e. the variability of the estimation in the sample, can be assessed by the width of the confidence interval: the narrower the confidence interval, the more precise the estimation. Therefore, the clinical relevance of the findings is easier to infere from such results than from those only reporting p values. However, the estimation and confidence interval method is not without its own limitations. This method is difficult to apply to non-parametric tests, and for some results, such as the comparison of mortality ratios, the p value is highly informative. On the other hand, the misinterpretation risk is not totally ruled out when estimation and confidence interval method is used. In the situations where both methods can be employed, there is not yet in the scientific community a definite consensus on which method is the best one to report statistical results, hence some experts suggest that both methods can be presented simultaneously, especially for clinical and epidemiological studies.
假设检验和显著性检验是目前医学文献中报告统计结果最广泛使用的方法。然而,这种方法有几个局限性。主要的一个局限性与p值的误解风险有关。用于确定结果是否具有统计学显著性的5%这一任意值并不总是被牢记,因此统计学显著性的概念可能会与临床或生物学相关性的概念相混淆。误解的陷阱大多与以下事实有关:p值并没有给出关于关联强度及其方向或样本变异性的精确指示。因此,一些专家声称,在报告统计结果时应避免使用假设检验和显著性检验,而基于估计和置信区间的方法应得到更广泛的应用。通过后一种方法,可以了解关联的方向和效应大小(即关联强度)。估计的精度,即样本中估计的变异性,可以通过置信区间的宽度来评估:置信区间越窄,估计就越精确。因此,从这些结果中比仅报告p值的结果更容易推断出研究结果的临床相关性。然而,估计和置信区间方法也并非没有其自身的局限性。这种方法难以应用于非参数检验,而且对于某些结果,如死亡率的比较,p值具有很高的信息量。另一方面,使用估计和置信区间方法时,误解风险也不能完全排除。在两种方法都可以使用的情况下,科学界对于哪种方法是报告统计结果的最佳方法尚未达成明确的共识,因此一些专家建议可以同时呈现这两种方法,特别是对于临床和流行病学研究。