International Initiative for Impact Evaluation (3ie), Washington, District of Columbia, United States of America.
Department of Economics, University of Copenhagen, Copenhagen, Denmark.
PLoS One. 2018 Dec 21;13(12):e0209416. doi: 10.1371/journal.pone.0209416. eCollection 2018.
Empirical research that cannot be reproduced using the original dataset and software code (replication files) creates a credibility challenge, as it means those published findings are not verifiable. This study reports the results of a research audit exercise, known as the push button replication project, that tested a sample of studies that use similar empirical methods but span a variety of academic fields.
We developed and piloted a detailed protocol for conducting push button replication and determining the level of comparability of these replication findings to original findings. We drew a sample of articles from the ten journals that published the most impact evaluations from low- and middle-income countries from 2010 through 2012. This set includes health, economics, and development journals. We then selected all articles in these journals published in 2014 that meet the same inclusion criteria and implemented the protocol on the sample.
Of the 109 articles in our sample, only 27 are push button replicable, meaning the provided code run on the provided dataset produces comparable findings for the key results in the published article. The authors of 59 of the articles refused to provide replication files. Thirty of these 59 articles were published in journals that had replication file requirements in 2014, meaning these articles are non-compliant with their journal requirements. For the remaining 23 of the 109 articles, we confirmed that three had proprietary data, we received incomplete replication files for 15, and we found minor differences in the replication results for five.
The findings presented here reveal that many economics, development, and public health researchers are a long way from adopting the norm of open research. Journals do not appear to be playing a strong role in ensuring the availability of replication files.
无法使用原始数据集和软件代码(复制文件)重现的实证研究带来了可信度挑战,因为这意味着已发表的发现无法验证。本研究报告了一项名为“一键复制”的研究审核工作的结果,该工作测试了一组使用类似实证方法但涵盖各种学术领域的研究。
我们开发并试点了一项详细的协议,用于进行一键复制,并确定这些复制结果与原始结果的可比性程度。我们从 2010 年至 2012 年发表的来自中低收入国家的最有影响力的评估论文的十本期刊中抽取了一组文章。这组期刊包括健康、经济学和发展期刊。然后,我们选择了这些期刊中 2014 年发表的符合相同纳入标准的所有文章,并在样本上实施了该协议。
在我们的样本中,只有 27 篇文章是可以一键复制的,这意味着提供的代码在提供的数据集上运行,可以为已发表文章中的关键结果生成可比的发现。59 篇文章的作者拒绝提供复制文件。这 59 篇文章中有 30 篇是在 2014 年有复制文件要求的期刊上发表的,这意味着这些文章不符合其期刊要求。对于剩余的 109 篇文章中的 23 篇,我们确认有三篇文章有专有数据,对 15 篇文章我们收到了不完整的复制文件,对于 5 篇文章我们发现了复制结果的细微差异。
这里提出的发现表明,许多经济学、发展和公共卫生研究人员离采用开放研究规范还有很长的路要走。期刊似乎没有在确保复制文件的可用性方面发挥强有力的作用。