Institut für medizinische Biometrie, Epidemiologie und Informatik Mainz, Germany.
Dtsch Arztebl Int. 2010 Jan;107(4):50-6. doi: 10.3238/arztebl.2010.0050. Epub 2010 Jan 29.
When reading reports of medical research findings, one is usually confronted with p-values. Publications typically contain not just one p-value, but an abundance of them, mostly accompanied by the word "significant." This article is intended to help readers understand the problem of multiple p-values and how to deal with it.
When multiple p-values appear in a single study, this is usually a problem of multiple testing. A number of valid approaches are presented for dealing with the problem. This article is based on classical statistical methods as presented in many textbooks and on selected specialized literature.
Conclusions from publications with many "significant" results should be judged with caution if the authors have not taken adequate steps to correct for multiple testing. Researchers should define the goal of their study clearly at the outset and, if possible, define a single primary endpoint a priori. If the study is of an exploratory or hypothesis-generating nature, it should be clearly stated that any positive results might be due to chance and will need to be confirmed in further targeted studies.
It is recommended that the word "significant" be used and interpreted with care. Readers should assess articles critically with regard to the problem of multiple testing. Authors should state the number of tests that were performed. Scientific articles should be judged on their scientific merit rather than by the number of times they contain the word "significant."
阅读医学研究报告时,人们通常会遇到 p 值。出版物中通常不仅包含一个 p 值,而是大量的 p 值,其中大多数都伴随着“显著”一词。本文旨在帮助读者理解多个 p 值的问题以及如何处理它。
当单个研究中出现多个 p 值时,这通常是多重检验的问题。本文提出了一些有效的方法来处理这个问题。本文基于许多教科书中以及一些专业文献中所呈现的经典统计方法。
如果作者没有采取适当的措施来纠正多重检验,那么对于有许多“显著”结果的出版物的结论应该谨慎判断。研究人员应该在一开始就明确研究的目标,如果可能的话,事先定义一个单一的主要终点。如果研究具有探索性或产生假说的性质,则应明确说明任何阳性结果可能是由于偶然因素引起的,需要在进一步的靶向研究中加以证实。
建议谨慎使用和解释“显著”一词。读者应根据多重检验的问题批判性地评估文章。作者应说明进行了多少次测试。应该根据科学价值而不是包含“显著”一词的次数来评判科学文章。