• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

可重复的生物信息学项目:一个用于可重复生物信息学分析流程的社区。

Reproducible bioinformatics project: a community for reproducible bioinformatics analysis pipelines.

机构信息

Department of Molecular Biotechnology and Health Sciences, University of Torino, Torino, Italy.

Department of Oncology, University of Torino, Candiolo, Italy.

出版信息

BMC Bioinformatics. 2018 Oct 15;19(Suppl 10):349. doi: 10.1186/s12859-018-2296-x.

DOI:10.1186/s12859-018-2296-x
PMID:30367595
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6191970/
Abstract

BACKGROUND

Reproducibility of a research is a key element in the modern science and it is mandatory for any industrial application. It represents the ability of replicating an experiment independently by the location and the operator. Therefore, a study can be considered reproducible only if all used data are available and the exploited computational analysis workflow is clearly described. However, today for reproducing a complex bioinformatics analysis, the raw data and the list of tools used in the workflow could be not enough to guarantee the reproducibility of the results obtained. Indeed, different releases of the same tools and/or of the system libraries (exploited by such tools) might lead to sneaky reproducibility issues.

RESULTS

To address this challenge, we established the Reproducible Bioinformatics Project (RBP), which is a non-profit and open-source project, whose aim is to provide a schema and an infrastructure, based on docker images and R package, to provide reproducible results in Bioinformatics. One or more Docker images are then defined for a workflow (typically one for each task), while the workflow implementation is handled via R-functions embedded in a package available at github repository. Thus, a bioinformatician participating to the project has firstly to integrate her/his workflow modules into Docker image(s) exploiting an Ubuntu docker image developed ad hoc by RPB to make easier this task. Secondly, the workflow implementation must be realized in R according to an R-skeleton function made available by RPB to guarantee homogeneity and reusability among different RPB functions. Moreover she/he has to provide the R vignette explaining the package functionality together with an example dataset which can be used to improve the user confidence in the workflow utilization.

CONCLUSIONS

Reproducible Bioinformatics Project provides a general schema and an infrastructure to distribute robust and reproducible workflows. Thus, it guarantees to final users the ability to repeat consistently any analysis independently by the used UNIX-like architecture.

摘要

背景

研究的可重复性是现代科学的关键要素,也是任何工业应用的必备条件。它代表了在不同地点和操作人员的情况下独立复制实验的能力。因此,只有在所有使用的数据都可用并且所利用的计算分析工作流程得到了清晰描述的情况下,研究才可以被认为是可重复的。然而,今天对于复制复杂的生物信息学分析,工作流程中使用的原始数据和工具列表可能不足以保证获得的结果的可重复性。实际上,同一工具的不同版本和/或系统库(这些工具所利用的)可能会导致难以察觉的可重复性问题。

结果

为了解决这个挑战,我们建立了可重复生物信息学项目(RBP),这是一个非营利性的开源项目,旨在提供基于 Docker 镜像和 R 包的方案和基础设施,以提供生物信息学中的可重复结果。然后,针对工作流程定义一个或多个 Docker 镜像(通常每个任务一个),而工作流程的实现则通过嵌入在 github 存储库中的 R 函数来处理。因此,参与该项目的生物信息学家首先必须利用 RBP 专门开发的 Ubuntu docker 镜像将她/他的工作流程模块集成到 Docker 镜像中,以简化此任务。其次,工作流程的实现必须根据 RBP 提供的 R 骨架函数在 R 中实现,以保证不同 RBP 函数之间的同质性和可重用性。此外,她/他还必须提供解释包功能的 R 简介以及可以用于提高用户对工作流程使用的信心的示例数据集。

结论

可重复生物信息学项目提供了一种通用的方案和基础设施来分发强大且可重复的工作流程。因此,它保证了最终用户能够根据所使用的 UNIX 类架构一致地重复任何分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5af8/6191970/d2ae49b76079/12859_2018_2296_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5af8/6191970/b5cbeac9aca5/12859_2018_2296_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5af8/6191970/a4c942de32bd/12859_2018_2296_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5af8/6191970/a1fa49506745/12859_2018_2296_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5af8/6191970/6256f57f7327/12859_2018_2296_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5af8/6191970/bc68cd7246ab/12859_2018_2296_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5af8/6191970/d2ae49b76079/12859_2018_2296_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5af8/6191970/b5cbeac9aca5/12859_2018_2296_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5af8/6191970/a4c942de32bd/12859_2018_2296_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5af8/6191970/a1fa49506745/12859_2018_2296_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5af8/6191970/6256f57f7327/12859_2018_2296_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5af8/6191970/bc68cd7246ab/12859_2018_2296_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5af8/6191970/d2ae49b76079/12859_2018_2296_Fig6_HTML.jpg

相似文献

1
Reproducible bioinformatics project: a community for reproducible bioinformatics analysis pipelines.可重复的生物信息学项目:一个用于可重复生物信息学分析流程的社区。
BMC Bioinformatics. 2018 Oct 15;19(Suppl 10):349. doi: 10.1186/s12859-018-2296-x.
2
Building Containerized Workflows Using the BioDepot-Workflow-Builder.使用 BioDepot-Workflow-Builder 构建容器化工作流程。
Cell Syst. 2019 Nov 27;9(5):508-514.e3. doi: 10.1016/j.cels.2019.08.007. Epub 2019 Sep 11.
3
Bioportainer Workbench: a versatile and user-friendly system that integrates implementation, management, and use of bioinformatics resources in Docker environments.生物端口工作台:一个功能多样且用户友好的系统,它集成了在 Docker 环境中实施、管理和使用生物信息学资源。
Gigascience. 2019 Apr 1;8(4). doi: 10.1093/gigascience/giz041.
4
CREDO: a friendly Customizable, REproducible, DOcker file generator for bioinformatics applications.CREDO:一个用于生物信息学应用的友好的可定制、可重复、Docker 文件生成器。
BMC Bioinformatics. 2024 Mar 12;25(1):110. doi: 10.1186/s12859-024-05695-9.
5
Dugong: a Docker image, based on Ubuntu Linux, focused on reproducibility and replicability for bioinformatics analyses.儒艮:一个基于 Ubuntu Linux 的 Docker 镜像,专注于生物信息学分析的可重复性。
Bioinformatics. 2018 Feb 1;34(3):514-515. doi: 10.1093/bioinformatics/btx554.
6
Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics.为异构计算环境开发可重现的生物信息学分析工作流程,以支持非洲基因组学。
BMC Bioinformatics. 2018 Nov 29;19(1):457. doi: 10.1186/s12859-018-2446-1.
7
ballaxy: web services for structural bioinformatics.Ballaxy:用于结构生物信息学的网络服务。
Bioinformatics. 2015 Jan 1;31(1):121-2. doi: 10.1093/bioinformatics/btu574. Epub 2014 Sep 2.
8
JMS: An Open Source Workflow Management System and Web-Based Cluster Front-End for High Performance Computing.JMS:一个用于高性能计算的开源工作流管理系统和基于网络的集群前端。
PLoS One. 2015 Aug 17;10(8):e0134273. doi: 10.1371/journal.pone.0134273. eCollection 2015.
9
Investigating reproducibility and tracking provenance - A genomic workflow case study.研究可重复性与追溯来源——一个基因组工作流程案例研究
BMC Bioinformatics. 2017 Jul 12;18(1):337. doi: 10.1186/s12859-017-1747-0.
10
Biowep: a workflow enactment portal for bioinformatics applications.生物工作流引擎(Biowep):一个用于生物信息学应用的工作流制定门户。
BMC Bioinformatics. 2007 Mar 8;8 Suppl 1(Suppl 1):S19. doi: 10.1186/1471-2105-8-S1-S19.

引用本文的文献

1
Nonproteolytic ubiquitination regulates chromatin occupancy by the NCoR/SMRT/HDAC3 corepressor complex in MCF-7 breast cancer cells.非蛋白酶体泛素化调控MCF-7乳腺癌细胞中NCoR/SMRT/HDAC3共抑制复合物对染色质的占据。
Proc Natl Acad Sci U S A. 2025 May 6;122(18):e2502805122. doi: 10.1073/pnas.2502805122. Epub 2025 Apr 30.
2
Cross-tissue MiRNA profiling of extracellular vesicles and PBMCs from amyotrophic lateral sclerosis patients.肌萎缩侧索硬化症患者细胞外囊泡和外周血单核细胞的跨组织微小RNA分析
Sci Rep. 2025 Apr 29;15(1):14976. doi: 10.1038/s41598-025-99206-2.
3
EAVLD 2024 - 7 Congress of the European Association of Veterinary Laboratory Diagnosticians.

本文引用的文献

1
Small non-coding RNA profiling in human biofluids and surrogate tissues from healthy individuals: description of the diverse and most represented species.健康个体生物流体和替代组织中的小非编码RNA分析:对多种且最具代表性的物种的描述。
Oncotarget. 2017 Dec 14;9(3):3097-3111. doi: 10.18632/oncotarget.23203. eCollection 2018 Jan 9.
2
HashClone: a new tool to quantify the minimal residual disease in B-cell lymphoma from deep sequencing data.HashClone:一种从深度测序数据中量化B细胞淋巴瘤微小残留病的新工具。
BMC Bioinformatics. 2017 Nov 23;18(1):516. doi: 10.1186/s12859-017-1923-2.
3
Reproducible Bioconductor workflows using browser-based interactive notebooks and containers.
2024年欧洲兽医实验室诊断医师协会第七届大会
Ital J Food Saf. 2024 Dec 16;13(4):13488. doi: 10.4081/ijfs.2024.13488. eCollection 2024 Nov 12.
4
Extracellular vesicle miRNome during subclinical mastitis in dairy cows.奶牛亚临床乳腺炎中外泌体 miRNA 组。
Vet Res. 2024 Sep 19;55(1):112. doi: 10.1186/s13567-024-01367-x.
5
Cancer treatment comes to age: from one-size-fits-all to next-generation sequencing (NGS) technologies.癌症治疗走向成熟:从一刀切到新一代测序(NGS)技术。
Bioimpacts. 2024;14(4):29957. doi: 10.34172/bi.2023.29957. Epub 2023 Dec 23.
6
CREDO: a friendly Customizable, REproducible, DOcker file generator for bioinformatics applications.CREDO:一个用于生物信息学应用的友好的可定制、可重复、Docker 文件生成器。
BMC Bioinformatics. 2024 Mar 12;25(1):110. doi: 10.1186/s12859-024-05695-9.
7
Circulating hsa-miR-5096 predicts F-FDG PET/CT positivity and modulates somatostatin receptor 2 expression: a novel miR-based assay for pancreatic neuroendocrine tumors.循环中的hsa-miR-5096可预测F-FDG PET/CT阳性并调节生长抑素受体2的表达:一种用于胰腺神经内分泌肿瘤的基于miR的新型检测方法。
Front Oncol. 2023 May 23;13:1136331. doi: 10.3389/fonc.2023.1136331. eCollection 2023.
8
Reproducibility in computational sleep research: a call for action.计算睡眠研究中的可重复性:行动呼吁。
Sleep. 2024 Jan 11;47(1). doi: 10.1093/sleep/zsad143.
9
Gut Microbiome and Small RNA Integrative-Omic Perspective of Meconium and Milk-FED Infant Stool Samples.胎粪和母乳喂养婴儿粪便样本的肠道微生物组和小 RNA 整合组学视角。
Int J Mol Sci. 2023 Apr 29;24(9):8069. doi: 10.3390/ijms24098069.
10
p140Cap inhibits β-Catenin in the breast cancer stem cell compartment instructing a protective anti-tumor immune response.p140Cap 在乳腺癌干细胞区室中抑制 β-连环蛋白,指示保护性抗肿瘤免疫反应。
Nat Commun. 2023 May 11;14(1):2350. doi: 10.1038/s41467-023-37824-y.
使用基于浏览器的交互式笔记本和容器实现可重复的 Bioconductor 工作流程。
J Am Med Inform Assoc. 2018 Jan 1;25(1):4-12. doi: 10.1093/jamia/ocx120.
4
SeqBox: RNAseq/ChIPseq reproducible analysis on a consumer game computer.SeqBox:在消费级游戏计算机上进行 RNAseq/ChIPseq 可重复分析。
Bioinformatics. 2018 Mar 1;34(5):871-872. doi: 10.1093/bioinformatics/btx674.
5
An architecture for genomics analysis in a clinical setting using Galaxy and Docker.在临床环境中使用 Galaxy 和 Docker 进行基因组学分析的架构。
Gigascience. 2017 Nov 1;6(11):1-9. doi: 10.1093/gigascience/gix099.
6
Dugong: a Docker image, based on Ubuntu Linux, focused on reproducibility and replicability for bioinformatics analyses.儒艮:一个基于 Ubuntu Linux 的 Docker 镜像,专注于生物信息学分析的可重复性。
Bioinformatics. 2018 Feb 1;34(3):514-515. doi: 10.1093/bioinformatics/btx554.
7
Arkas: Rapid reproducible RNAseq analysis.阿尔卡斯:快速可重复的RNA测序分析。
F1000Res. 2017 Apr 27;6:586. doi: 10.12688/f1000research.11355.2. eCollection 2017.
8
Bio-Docklets: virtualization containers for single-step execution of NGS pipelines.生物小容器:用于下一代测序流程单步执行的虚拟化容器。
Gigascience. 2017 Aug 1;6(8):1-7. doi: 10.1093/gigascience/gix048.
9
A long journey to reproducible results.通往可重复结果的漫长征程。
Nature. 2017 Aug 22;548(7668):387-388. doi: 10.1038/548387a.
10
Evaluation and comparison of computational tools for RNA-seq isoform quantification.RNA测序异构体定量计算工具的评估与比较
BMC Genomics. 2017 Aug 7;18(1):583. doi: 10.1186/s12864-017-4002-1.