Goodman S N
Division of Biostatistics, Oncology Center, Johns Hopkins University School of Medicine, Baltimore, MD.
Am J Epidemiol. 1993 Mar 1;137(5):485-96; discussion 497-501. doi: 10.1093/oxfordjournals.aje.a116700.
It is not generally appreciated that the p value, as conceived by R. A. Fisher, is not compatible with the Neyman-Pearson hypothesis test in which it has become embedded. The p value was meant to be a flexible inferential measure, whereas the hypothesis test was a rule for behavior, not inference. The combination of the two methods has led to a reinterpretation of the p value simultaneously as an "observed error rate" and as a measure of evidence. Both of these interpretations are problematic, and their combination has obscured the important differences between Neyman and Fisher on the nature of the scientific method and inhibited our understanding of the philosophic implications of the basic methods in use today. An analysis using another method promoted by Fisher, mathematical likelihood, shows that the p value substantially overstates the evidence against the null hypothesis. Likelihood makes clearer the distinction between error rates and inferential evidence and is a quantitative tool for expressing evidential strength that is more appropriate for the purposes of epidemiology than the p value.
人们普遍没有认识到,R. A. 费希尔所构想的p值与它已融入其中的奈曼 - 皮尔逊假设检验并不兼容。p值原本是一种灵活的推断性度量,而假设检验是一种行为规则,而非推断方法。这两种方法的结合导致了对p值的重新解释,它同时被视为“观察到的错误率”和证据度量。这两种解释都存在问题,它们的结合模糊了奈曼和费希尔在科学方法本质上的重要差异,并阻碍了我们对当今所用基本方法哲学含义的理解。使用费希尔倡导的另一种方法——数学似然性进行的分析表明,p值大大高估了反对原假设的证据。似然性更清晰地划分了错误率和推断证据之间的区别,并且是一种用于表达证据强度的定量工具,比p值更适合流行病学的目的。