Albert Ludwigs University, Freiburg, Germany.
The Pennsylvania State University, University Park, PA, USA.
Cell Syst. 2018 Jun 27;6(6):631-635. doi: 10.1016/j.cels.2018.03.014.
Many areas of research suffer from poor reproducibility, particularly in computationally intensive domains where results rely on a series of complex methodological decisions that are not well captured by traditional publication approaches. Various guidelines have emerged for achieving reproducibility, but implementation of these practices remains difficult due to the challenge of assembling software tools plus associated libraries, connecting tools together into pipelines, and specifying parameters. Here, we discuss a suite of cutting-edge technologies that make computational reproducibility not just possible, but practical in both time and effort. This suite combines three well-tested components-a system for building highly portable packages of bioinformatics software, containerization and virtualization technologies for isolating reusable execution environments for these packages, and workflow systems that automatically orchestrate the composition of these packages for entire pipelines-to achieve an unprecedented level of computational reproducibility. We also provide a practical implementation and five recommendations to help set a typical researcher on the path to performing data analyses reproducibly.
许多研究领域都存在可重复性差的问题,尤其是在计算密集型领域,其结果依赖于一系列复杂的方法学决策,这些决策很难通过传统的出版方法来捕捉。已经出现了各种实现可重复性的指南,但由于组装软件工具以及相关库、将工具连接到管道中并指定参数的挑战,这些实践的实施仍然很困难。在这里,我们讨论了一系列前沿技术,这些技术不仅使计算可重复性成为可能,而且在时间和精力上都具有实际意义。这个套件结合了三个经过充分测试的组件-一个用于构建高度可移植的生物信息学软件包的系统、用于隔离这些软件包的可重复使用执行环境的容器化和虚拟化技术,以及自动编排这些软件包组成整个管道的工作流系统-实现了前所未有的计算可重复性。我们还提供了一个实际的实现和五个建议,以帮助典型的研究人员走上可重复数据分析的道路。