• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用Pwrake进行敏捷并行生物信息学工作流程管理。

Agile parallel bioinformatics workflow management using Pwrake.

作者信息

Mishima Hiroyuki, Sasaki Kensaku, Tanaka Masahiro, Tatebe Osamu, Yoshiura Koh-Ichiro

机构信息

Department of Human Genetics, Nagasaki University Graduate School of Biomedical Sciences, 1-12-4 Sakamoto, Nagasaki, Nagasaki, Japan.

出版信息

BMC Res Notes. 2011 Sep 8;4:331. doi: 10.1186/1756-0500-4-331.

DOI:10.1186/1756-0500-4-331
PMID:21899774
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3180464/
Abstract

BACKGROUND

In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error.Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows.

FINDINGS

We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows.

CONCLUSIONS

Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability and maintainability of rakefiles may facilitate sharing workflows among the scientific community. Workflows for GATK and Dindel are available at http://github.com/misshie/Workflows.

摘要

背景

在生物信息学项目中,科学工作流系统被广泛用于管理计算过程。为满足工作流管理需求,人们提出了功能齐全的工作流系统。然而,这类系统对于实际的生物信息学实践而言往往过于臃肿。我们意识到,在科学工作流管理中,快速部署实现先进算法和数据格式的前沿软件,以及持续适应计算资源和环境的变化通常更为重要。通过反复试验后的迭代开发阶段,这些特性与敏捷软件开发方法具有更高的契合度。在此,我们展示了科学工作流系统Pwrake在生物信息学工作流中的应用。Pwrake是Ruby标准构建工具Rake的并行工作流扩展,其灵活性已在天文学领域得到证明。因此,我们假设Pwrake在实际的生物信息学工作流中也具有优势。

研究结果

我们使用基因组分析工具包(GATK)和Dindel实现了Pwrake工作流来处理下一代测序数据。GATK和Dindel工作流分别是顺序工作流和并行工作流的典型示例。我们发现在实践中,实际的科学工作流开发在两个阶段进行迭代,即工作流定义阶段和参数调整阶段。我们引入了单独的工作流定义来帮助专注于这两个开发阶段中的每一个,以及辅助方法来简化描述。这种方法提高了迭代开发效率。此外,我们实现了组合工作流以展示GATK和Dindel工作流的模块化。

结论

Pwrake能够对生物信息学领域的科学工作流进行敏捷管理。基于Ruby构建的内部领域特定语言设计赋予了rakefile编写科学工作流的灵活性。此外,rakefile的可读性和可维护性可能有助于在科学界共享工作流。GATK和Dindel的工作流可在http://github.com/misshie/Workflows获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e949/3180464/df47e642d42e/1756-0500-4-331-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e949/3180464/e8c0429c2b42/1756-0500-4-331-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e949/3180464/df47e642d42e/1756-0500-4-331-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e949/3180464/e8c0429c2b42/1756-0500-4-331-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e949/3180464/df47e642d42e/1756-0500-4-331-2.jpg

相似文献

1
Agile parallel bioinformatics workflow management using Pwrake.使用Pwrake进行敏捷并行生物信息学工作流程管理。
BMC Res Notes. 2011 Sep 8;4:331. doi: 10.1186/1756-0500-4-331.
2
Managing genomic variant calling workflows with Swift/T.使用 Swift/T 管理基因组变异调用工作流。
PLoS One. 2019 Jul 9;14(7):e0211608. doi: 10.1371/journal.pone.0211608. eCollection 2019.
3
Classification of bioinformatics workflows using weighted versions of partitioning and hierarchical clustering algorithms.使用分区和层次聚类算法的加权版本对生物信息学工作流程进行分类。
BMC Bioinformatics. 2015 Mar 3;16:68. doi: 10.1186/s12859-015-0508-1.
4
Workflows for microarray data processing in the Kepler environment.在 Kepler 环境中进行微阵列数据处理的工作流程。
BMC Bioinformatics. 2012 May 17;13:102. doi: 10.1186/1471-2105-13-102.
5
SciPipe: A workflow library for agile development of complex and dynamic bioinformatics pipelines.SciPipe:一个用于敏捷开发复杂和动态生物信息学管道的工作流库。
Gigascience. 2019 May 1;8(5). doi: 10.1093/gigascience/giz044.
6
Biowep: a workflow enactment portal for bioinformatics applications.生物工作流引擎(Biowep):一个用于生物信息学应用的工作流制定门户。
BMC Bioinformatics. 2007 Mar 8;8 Suppl 1(Suppl 1):S19. doi: 10.1186/1471-2105-8-S1-S19.
7
OVarFlow: a resource optimized GATK 4 based Open source Variant calling workFlow.OVarFlow:一种基于资源优化的 GATK4 的开源变异调用工作流程。
BMC Bioinformatics. 2021 Aug 13;22(1):402. doi: 10.1186/s12859-021-04317-y.
8
Tavaxy: integrating Taverna and Galaxy workflows with cloud computing support.Tavaxy:集成 Taverna 和 Galaxy 工作流并提供云计算支持。
BMC Bioinformatics. 2012 May 4;13:77. doi: 10.1186/1471-2105-13-77.
9
Distilling structure in Taverna scientific workflows: a refactoring approach.Taverna 科学工作流中的结构提取:一种重构方法。
BMC Bioinformatics. 2014;15 Suppl 1(Suppl 1):S12. doi: 10.1186/1471-2105-15-S1-S12. Epub 2014 Jan 10.
10
Semantic workflows for benchmark challenges: Enhancing comparability, reusability and reproducibility.用于基准挑战的语义工作流:提高可比性、可重用性和可重复性。
Pac Symp Biocomput. 2019;24:208-219.

引用本文的文献

1
Aberrant hypomethylation at imprinted differentially methylated regions is involved in biparental placental mesenchymal dysplasia.印迹差异甲基化区域的异常低甲基化与双亲性胎盘间充质发育不良有关。
Clin Epigenetics. 2022 May 17;14(1):64. doi: 10.1186/s13148-022-01280-0.
2
Heterozygous missense variant of the proteasome subunit β-type 9 causes neonatal-onset autoinflammation and immunodeficiency.蛋白酶体亚基β型 9 的杂合错义变体导致新生儿起病的自身炎症和免疫缺陷。
Nat Commun. 2021 Nov 24;12(1):6819. doi: 10.1038/s41467-021-27085-y.
3
HaTSPiL: A modular pipeline for high-throughput sequencing data analysis.

本文引用的文献

1
A framework for variation discovery and genotyping using next-generation DNA sequencing data.利用下一代 DNA 测序数据进行变异发现和基因分型的框架。
Nat Genet. 2011 May;43(5):491-8. doi: 10.1038/ng.806. Epub 2011 Apr 10.
2
A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines.一个轻量级、基于流的工具包,用于并行和分布式生物信息学管道。
BMC Bioinformatics. 2011 Feb 25;12:61. doi: 10.1186/1471-2105-12-61.
3
A map of human genome variation from population-scale sequencing.人类基因组变异的图谱来自于基于人群的测序。
HaTSPiL:一个用于高通量测序数据分析的模块化管道。
PLoS One. 2019 Oct 15;14(10):e0222512. doi: 10.1371/journal.pone.0222512. eCollection 2019.
4
Nonsense mutation in causes normal-pressure hydrocephalus with ciliary abnormalities.导致常压性脑积水和纤毛异常的无义突变。
Neurology. 2019 May 14;92(20):e2364-e2374. doi: 10.1212/WNL.0000000000007505. Epub 2019 Apr 19.
5
Open Agile text mining for bioinformatics: the PubAnnotation ecosystem.开放的生物信息学敏捷文本挖掘:PubAnnotation 生态系统。
Bioinformatics. 2019 Nov 1;35(21):4372-4380. doi: 10.1093/bioinformatics/btz227.
6
Identification of a homozygous frameshift variant in RFLNA in a patient with a typical phenotype of spondylocarpotarsal synostosis syndrome.在一位具有典型脊椎颅面骨发育不良综合征表型的患者中鉴定出 RFLNA 基因的纯合移码变异。
J Hum Genet. 2019 May;64(5):467-471. doi: 10.1038/s10038-019-0581-9. Epub 2019 Feb 22.
7
Whole-exome sequencing and gene-based rare variant association tests suggest that PLA2G4E might be a risk gene for panic disorder.全外显子组测序和基于基因的罕见变异关联测试提示 PLA2G4E 可能是惊恐障碍的风险基因。
Transl Psychiatry. 2018 Feb 2;8(1):41. doi: 10.1038/s41398-017-0088-0.
8
Deep sequencing reveals variations in somatic cell mosaic mutations between monozygotic twins with discordant psychiatric disease.深度测序揭示了患有不一致精神疾病的同卵双胞胎之间体细胞镶嵌突变的差异。
Hum Genome Var. 2017 Jul 27;4:17032. doi: 10.1038/hgv.2017.32. eCollection 2017.
9
DRAW+SneakPeek: analysis workflow and quality metric management for DNA-seq experiments.DRAW+SneakPeek:用于 DNA 测序实验的分析工作流程和质量指标管理。
Bioinformatics. 2013 Oct 1;29(19):2498-500. doi: 10.1093/bioinformatics/btt422. Epub 2013 Aug 13.
10
The Ruby UCSC API: accessing the UCSC genome database using Ruby.Ruby UCSC API:使用 Ruby 访问 UCSC 基因组数据库。
BMC Bioinformatics. 2012 Sep 21;13:240. doi: 10.1186/1471-2105-13-240.
Nature. 2010 Oct 28;467(7319):1061-73. doi: 10.1038/nature09534.
4
Dindel: accurate indel calls from short-read data.Dindel:从短读数据中进行精确的插入缺失突变(Indel)调用。
Genome Res. 2011 Jun;21(6):961-73. doi: 10.1101/gr.112326.110. Epub 2010 Oct 27.
5
Whole-genome sequencing and comprehensive variant analysis of a Japanese individual using massively parallel sequencing.利用大规模平行测序对日本人进行全基因组测序和全面变异分析。
Nat Genet. 2010 Nov;42(11):931-6. doi: 10.1038/ng.691. Epub 2010 Oct 24.
6
Ruffus: a lightweight Python library for computational pipelines.Ruffus:一个用于计算流水线的轻量级 Python 库。
Bioinformatics. 2010 Nov 1;26(21):2778-9. doi: 10.1093/bioinformatics/btq524. Epub 2010 Sep 16.
7
BioRuby: bioinformatics software for the Ruby programming language.BioRuby:用于 Ruby 编程语言的生物信息学软件。
Bioinformatics. 2010 Oct 15;26(20):2617-9. doi: 10.1093/bioinformatics/btq475. Epub 2010 Aug 25.
8
Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences.Galaxy:一种支持生命科学领域可访问、可重现和透明计算研究的综合方法。
Genome Biol. 2010;11(8):R86. doi: 10.1186/gb-2010-11-8-r86. Epub 2010 Aug 25.
9
The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.基因组分析工具包:一种用于分析下一代 DNA 测序数据的 MapReduce 框架。
Genome Res. 2010 Sep;20(9):1297-303. doi: 10.1101/gr.107524.110. Epub 2010 Jul 19.
10
myExperiment: a repository and social network for the sharing of bioinformatics workflows.myExperiment:一个用于生物信息学工作流程共享的存储库和社交网络。
Nucleic Acids Res. 2010 Jul;38(Web Server issue):W677-82. doi: 10.1093/nar/gkq429. Epub 2010 May 25.