Biometry Research Group, Division of Cancer Prevention, National Cancer Institute, 6130 Executive Boulevard, Bethesda, MD 20892-7354, USA.
Biostatistics. 2010 Jul;11(3):413-8. doi: 10.1093/biostatistics/kxq004. Epub 2010 Feb 19.
With the analysis of complex, messy data sets, the statistics community has recently focused attention on "reproducible research," namely research that can be readily replicated by others. One standard that has been proposed is the availability of data sets and computer code. However, in some situations, raw data cannot be disseminated for reasons of confidentiality or because the data are so messy as to make dissemination impractical. For one such situation, we propose 2 steps for reproducible research: (i) presentation of a table of data and (ii) presentation of a formula to estimate key quantities from the table of data. We illustrate this strategy in the analysis of data from the Prostate Cancer Prevention Trial, which investigated the effect of the drug finasteride versus placebo on the period prevalence of prostate cancer. With such an important result at stake, a transparent analysis was important.
随着对复杂、混乱数据集的分析,统计学界最近将注意力集中在“可重现研究”上,即其他研究人员可以轻松复制的研究。已经提出的一个标准是数据集和计算机代码的可用性。然而,在某些情况下,出于保密原因或因为数据过于混乱而无法传播,原始数据无法传播。对于这种情况之一,我们提出了可重现研究的 2 个步骤:(i)呈现数据表格,(ii)呈现从数据表中估计关键数量的公式。我们在分析前列腺癌预防试验的数据时说明了这种策略,该试验研究了药物非那雄胺与安慰剂对前列腺癌期间流行率的影响。由于这个重要结果的存在,透明的分析是很重要的。