生命科学中的实用计算可重复性。

Practical Computational Reproducibility in the Life Sciences.

机构信息

Albert Ludwigs University, Freiburg, Germany.

The Pennsylvania State University, University Park, PA, USA.

出版信息

Cell Syst. 2018 Jun 27;6(6):631-635. doi: 10.1016/j.cels.2018.03.014.

DOI:10.1016/j.cels.2018.03.014

PMID:29953862

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6263957/

Abstract

Many areas of research suffer from poor reproducibility, particularly in computationally intensive domains where results rely on a series of complex methodological decisions that are not well captured by traditional publication approaches. Various guidelines have emerged for achieving reproducibility, but implementation of these practices remains difficult due to the challenge of assembling software tools plus associated libraries, connecting tools together into pipelines, and specifying parameters. Here, we discuss a suite of cutting-edge technologies that make computational reproducibility not just possible, but practical in both time and effort. This suite combines three well-tested components-a system for building highly portable packages of bioinformatics software, containerization and virtualization technologies for isolating reusable execution environments for these packages, and workflow systems that automatically orchestrate the composition of these packages for entire pipelines-to achieve an unprecedented level of computational reproducibility. We also provide a practical implementation and five recommendations to help set a typical researcher on the path to performing data analyses reproducibly.

摘要

许多研究领域都存在可重复性差的问题，尤其是在计算密集型领域，其结果依赖于一系列复杂的方法学决策，这些决策很难通过传统的出版方法来捕捉。已经出现了各种实现可重复性的指南，但由于组装软件工具以及相关库、将工具连接到管道中并指定参数的挑战，这些实践的实施仍然很困难。在这里，我们讨论了一系列前沿技术，这些技术不仅使计算可重复性成为可能，而且在时间和精力上都具有实际意义。这个套件结合了三个经过充分测试的组件-一个用于构建高度可移植的生物信息学软件包的系统、用于隔离这些软件包的可重复使用执行环境的容器化和虚拟化技术，以及自动编排这些软件包组成整个管道的工作流系统-实现了前所未有的计算可重复性。我们还提供了一个实际的实现和五个建议，以帮助典型的研究人员走上可重复数据分析的道路。

相似文献

Practical Computational Reproducibility in the Life Sciences.生命科学中的实用计算可重复性。

Cell Syst. 2018 Jun 27;6(6):631-635. doi: 10.1016/j.cels.2018.03.014.

Recommendations for the packaging and containerizing of bioinformatics software.生物信息学软件的包装与容器化建议。

F1000Res. 2018 Jun 14;7. doi: 10.12688/f1000research.15140.2. eCollection 2018.

Watchdog 2.0: New developments for reusability, reproducibility, and workflow execution.Watchdog 2.0：可重用性、可重现性和工作流程执行的新发展。

Gigascience. 2020 Jun 1;9(6). doi: 10.1093/gigascience/giaa068.

Reproducible bioinformatics project: a community for reproducible bioinformatics analysis pipelines.可重复的生物信息学项目：一个用于可重复生物信息学分析流程的社区。

BMC Bioinformatics. 2018 Oct 15;19(Suppl 10):349. doi: 10.1186/s12859-018-2296-x.

Scalable Workflows and Reproducible Data Analysis for Genomics.基因组学的可扩展工作流程和可重复数据分析

Methods Mol Biol. 2019;1910:723-745. doi: 10.1007/978-1-4939-9074-0_24.

Designing integrated computational biology pipelines visually.可视化设计集成计算生物学流程

IEEE/ACM Trans Comput Biol Bioinform. 2013 May-Jun;10(3):605-18. doi: 10.1109/TCBB.2013.69.

A workflow reproducibility scale for automatic validation of biological interpretation results.用于自动验证生物学解释结果的工作流程可重复性量表。

Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad031. Epub 2023 May 8.

Investigating reproducibility and tracking provenance - A genomic workflow case study.研究可重复性与追溯来源——一个基因组工作流程案例研究

BMC Bioinformatics. 2017 Jul 12;18(1):337. doi: 10.1186/s12859-017-1747-0.

Constructing computational pipelines.构建计算管道。

Methods Mol Biol. 2008;453:451-70. doi: 10.1007/978-1-60327-429-6_24.

Reproducibility of computational workflows is automated using continuous analysis.计算工作流程的可重复性通过持续分析实现自动化。

Nat Biotechnol. 2017 Apr;35(4):342-346. doi: 10.1038/nbt.3780. Epub 2017 Mar 13.

引用本文的文献

Empowering bioinformatics communities with Nextflow and nf-core.借助Nextflow和nf-core助力生物信息学社区。

Genome Biol. 2025 Jul 29;26(1):228. doi: 10.1186/s13059-025-03673-9.

Galaxy QCxMS for straightforward semi-empirical quantum mechanical EI-MS prediction.用于直接半经验量子力学电子轰击质谱预测的Galaxy QCxMS。

GigaByte. 2025 Jul 4;2025:gigabyte160. doi: 10.46471/gigabyte.160. eCollection 2025.

Empathy and resting-state functional connectivity in children.儿童的共情与静息态功能连接

Neuroimage Rep. 2022 Oct 20;2(4):100142. doi: 10.1016/j.ynirp.2022.100142. eCollection 2022 Dec.

Provide proactive reproducible analysis transparency with every publication.在每次发表时提供主动的、可重复的分析透明度。

R Soc Open Sci. 2025 Mar 5;12(3):241936. doi: 10.1098/rsos.241936. eCollection 2025 Mar.

GitHub is an effective platform for collaborative and reproducible laboratory research.GitHub是一个用于协作和可重复的实验室研究的有效平台。

ArXiv. 2025 Feb 10:arXiv:2408.09344v2.

GitHub enables collaborative and reproducible laboratory research.GitHub支持协作式和可重复的实验室研究。

PLoS Biol. 2025 Feb 14;23(2):e3003029. doi: 10.1371/journal.pbio.3003029. eCollection 2025 Feb.

Guidance framework to apply best practices in ecological data analysis: lessons learned from building Galaxy-Ecology.应用生态数据分析最佳实践的指导框架：从构建Galaxy-Ecology中获得的经验教训。

Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giae122.

ENCORE: a practical implementation to improve reproducibility and transparency of computational research.ENCORE：一种提高计算研究可重复性和透明度的实际实施方案。

Nat Commun. 2024 Sep 16;15(1):8117. doi: 10.1038/s41467-024-52446-8.

Using interactive Jupyter Notebooks and BioConda for FAIR and reproducible biomolecular simulation workflows.使用交互式 Jupyter Notebook 和 BioConda 实现 FAIR 和可重复的生物分子模拟工作流。

PLoS Comput Biol. 2024 Jun 20;20(6):e1012173. doi: 10.1371/journal.pcbi.1012173. eCollection 2024 Jun.

The GEA pipeline for characterizing Escherichia coli and Salmonella genomes.用于表征大肠杆菌和沙门氏菌基因组的 GEA 管道。

Sci Rep. 2024 Jun 10;14(1):13257. doi: 10.1038/s41598-024-63832-z.

本文引用的文献

Bioconda: sustainable and comprehensive software distribution for the life sciences.生物conda：面向生命科学的可持续且全面的软件发行平台。

Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7.

Singularity: Scientific containers for mobility of compute.奇点：用于计算移动性的科学容器。

PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017.

BioContainers: an open-source and community-driven framework for software standardization.生物容器：一个开源且由社区驱动的软件标准化框架。

Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192.

The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows.码头仓库：实现基于Docker的基因组学工具和工作流程的模块化、以社区为中心的共享。

F1000Res. 2017 Jan 18;6:52. doi: 10.12688/f1000research.10137.1. eCollection 2017.

Reproducibility of computational workflows is automated using continuous analysis.计算工作流程的可重复性通过持续分析实现自动化。

Nat Biotechnol. 2017 Apr;35(4):342-346. doi: 10.1038/nbt.3780. Epub 2017 Mar 13.

1,500 scientists lift the lid on reproducibility.1500名科学家揭开了可重复性的盖子。

Nature. 2016 May 26;533(7604):452-4. doi: 10.1038/533452a.

The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update.用于可访问、可重复和协作式生物医学分析的Galaxy平台：2016年更新

Nucleic Acids Res. 2016 Jul 8;44(W1):W3-W10. doi: 10.1093/nar/gkw343. Epub 2016 May 2.

Robust research: Institutions must do their part for reproducibility.强有力的研究：各机构必须为可重复性尽自己的一份力。

Nature. 2015 Sep 3;525(7567):25-7. doi: 10.1038/525025a.

HISAT: a fast spliced aligner with low memory requirements.HISAT：一种内存需求低的快速剪接比对器。

Nat Methods. 2015 Apr;12(4):357-60. doi: 10.1038/nmeth.3317. Epub 2015 Mar 9.

Statistics. What is the question?统计学。问题是什么？

Science. 2015 Mar 20;347(6228):1314-5. doi: 10.1126/science.aaa6146. Epub 2015 Feb 26.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验