Institute of Technology, University of Washington, Tacoma, WA, USA.
Department of Clinical Investigation, Madigan Army Medical Center, Tacoma, WA, USA.
J Am Med Inform Assoc. 2018 Jan 1;25(1):4-12. doi: 10.1093/jamia/ocx120.
Bioinformatics publications typically include complex software workflows that are difficult to describe in a manuscript. We describe and demonstrate the use of interactive software notebooks to document and distribute bioinformatics research. We provide a user-friendly tool, BiocImageBuilder, that allows users to easily distribute their bioinformatics protocols through interactive notebooks uploaded to either a GitHub repository or a private server.
We present four different interactive Jupyter notebooks using R and Bioconductor workflows to infer differential gene expression, analyze cross-platform datasets, process RNA-seq data and KinomeScan data. These interactive notebooks are available on GitHub. The analytical results can be viewed in a browser. Most importantly, the software contents can be executed and modified. This is accomplished using Binder, which runs the notebook inside software containers, thus avoiding the need to install any software and ensuring reproducibility. All the notebooks were produced using custom files generated by BiocImageBuilder.
BiocImageBuilder facilitates the publication of workflows with a point-and-click user interface. We demonstrate that interactive notebooks can be used to disseminate a wide range of bioinformatics analyses. The use of software containers to mirror the original software environment ensures reproducibility of results. Parameters and code can be dynamically modified, allowing for robust verification of published results and encouraging rapid adoption of new methods.
Given the increasing complexity of bioinformatics workflows, we anticipate that these interactive software notebooks will become as necessary for documenting software methods as traditional laboratory notebooks have been for documenting bench protocols, and as ubiquitous.
生物信息学出版物通常包含复杂的软件工作流程,难以在文稿中描述。我们描述并展示了使用交互式软件笔记本记录和分发生物信息学研究的方法。我们提供了一个用户友好的工具 BiocImageBuilder,允许用户通过上传到 GitHub 存储库或专用服务器的交互式笔记本轻松分发他们的生物信息学协议。
我们使用 R 和 Bioconductor 工作流程展示了四个不同的交互式 Jupyter 笔记本,用于推断差异基因表达、分析跨平台数据集、处理 RNA-seq 数据和 KinomeScan 数据。这些交互式笔记本可在 GitHub 上查看。可以在浏览器中查看分析结果。最重要的是,可以执行和修改软件内容。这是通过 Binder 实现的,它在软件容器中运行笔记本,从而避免了安装任何软件的需要,并确保了可重复性。所有笔记本都是使用 BiocImageBuilder 生成的自定义文件制作的。
BiocImageBuilder 通过点击式用户界面简化了工作流程的发布。我们证明,交互式笔记本可用于传播广泛的生物信息学分析。使用软件容器来镜像原始软件环境可确保结果的可重复性。参数和代码可以动态修改,从而可以对已发布结果进行稳健验证,并鼓励快速采用新方法。
鉴于生物信息学工作流程的复杂性不断增加,我们预计这些交互式软件笔记本将像传统实验室笔记本记录工作台协议一样,成为记录软件方法的必要工具,并且会变得无处不在。