生物医学出版物中 Jupyter 笔记本的计算可重复性。
Computational reproducibility of Jupyter notebooks from biomedical publications.
机构信息
Heinz-Nixdorf Chair for Distributed Information Systems, Friedrich Schiller University Jena, Jena 07743, Germany.
Michael Stifel Center Jena, Jena 07743, Germany.
出版信息
Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giad113.
BACKGROUND
Jupyter notebooks facilitate the bundling of executable code with its documentation and output in one interactive environment, and they represent a popular mechanism to document and share computational workflows, including for research publications. The reproducibility of computational aspects of research is a key component of scientific reproducibility but has not yet been assessed at scale for Jupyter notebooks associated with biomedical publications.
APPROACH
We address computational reproducibility at 2 levels: (i) using fully automated workflows, we analyzed the computational reproducibility of Jupyter notebooks associated with publications indexed in the biomedical literature repository PubMed Central. We identified such notebooks by mining the article's full text, trying to locate them on GitHub, and attempting to rerun them in an environment as close to the original as possible. We documented reproduction success and exceptions and explored relationships between notebook reproducibility and variables related to the notebooks or publications. (ii) This study represents a reproducibility attempt in and of itself, using essentially the same methodology twice on PubMed Central over the course of 2 years, during which the corpus of Jupyter notebooks from articles indexed in PubMed Central has grown in a highly dynamic fashion.
RESULTS
Out of 27,271 Jupyter notebooks from 2,660 GitHub repositories associated with 3,467 publications, 22,578 notebooks were written in Python, including 15,817 that had their dependencies declared in standard requirement files and that we attempted to rerun automatically. For 10,388 of these, all declared dependencies could be installed successfully, and we reran them to assess reproducibility. Of these, 1,203 notebooks ran through without any errors, including 879 that produced results identical to those reported in the original notebook and 324 for which our results differed from the originally reported ones. Running the other notebooks resulted in exceptions.
CONCLUSIONS
We zoom in on common problems and practices, highlight trends, and discuss potential improvements to Jupyter-related workflows associated with biomedical publications.
背景
Jupyter 笔记本将可执行代码与其文档和输出捆绑在一个交互式环境中,这是一种流行的记录和共享计算工作流程的机制,包括用于研究出版物。研究中计算方面的可重复性是科学可重复性的关键组成部分,但尚未针对与生物医学出版物相关的 Jupyter 笔记本进行大规模评估。
方法
我们在两个层面上解决计算可重复性问题:(i)使用完全自动化的工作流程,我们分析了在生物医学文献存储库 PubMed Central 中索引的出版物相关联的 Jupyter 笔记本的计算可重复性。我们通过挖掘文章的全文来识别这些笔记本,尝试在 GitHub 上找到它们,并尝试在尽可能接近原始环境的环境中重新运行它们。我们记录了复制成功和异常情况,并探索了笔记本可重复性与与笔记本或出版物相关的变量之间的关系。(ii)这项研究本身就是一次可重复性尝试,在两年的时间里,在 PubMed Central 上两次使用基本相同的方法,在此期间,PubMed Central 中索引文章的 Jupyter 笔记本语料库以高度动态的方式增长。
结果
在与 3467 篇出版物相关的 2660 个 GitHub 存储库中,有 27271 个 Jupyter 笔记本,其中 22578 个是用 Python 编写的,包括 15817 个在标准需求文件中声明了依赖项的笔记本,我们试图自动重新运行这些文件。对于其中的 10388 个,所有声明的依赖项都可以成功安装,我们重新运行它们以评估可重复性。在这些笔记本中,有 1203 个没有任何错误地运行,其中包括 879 个产生与原始笔记本中报告的结果完全相同的结果,以及 324 个我们的结果与原始报告结果不同的结果。运行其他笔记本会导致异常。
结论
我们详细讨论了常见问题和实践,强调了趋势,并讨论了与生物医学出版物相关的 Jupyter 相关工作流程的潜在改进。