• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于进化基因组学的可扩展计算

Scalable computing for evolutionary genomics.

作者信息

Prins Pjotr, Belhachemi Dominique, Möller Steffen, Smant Geert

机构信息

Laboratory of Nematology, Wageningen University, Wageningen, The Netherlands.

出版信息

Methods Mol Biol. 2012;856:529-45. doi: 10.1007/978-1-61779-585-5_22.

DOI:10.1007/978-1-61779-585-5_22
PMID:22399474
Abstract

Genomic data analysis in evolutionary biology is becoming so computationally intensive that analysis of multiple hypotheses and scenarios takes too long on a single desktop computer. In this chapter, we discuss techniques for scaling computations through parallelization of calculations, after giving a quick overview of advanced programming techniques. Unfortunately, parallel programming is difficult and requires special software design. The alternative, especially attractive for legacy software, is to introduce poor man's parallelization by running whole programs in parallel as separate processes, using job schedulers. Such pipelines are often deployed on bioinformatics computer clusters. Recent advances in PC virtualization have made it possible to run a full computer operating system, with all of its installed software, on top of another operating system, inside a "box," or virtual machine (VM). Such a VM can flexibly be deployed on multiple computers, in a local network, e.g., on existing desktop PCs, and even in the Cloud, to create a "virtual" computer cluster. Many bioinformatics applications in evolutionary biology can be run in parallel, running processes in one or more VMs. Here, we show how a ready-made bioinformatics VM image, named BioNode, effectively creates a computing cluster, and pipeline, in a few steps. This allows researchers to scale-up computations from their desktop, using available hardware, anytime it is required. BioNode is based on Debian Linux and can run on networked PCs and in the Cloud. Over 200 bioinformatics and statistical software packages, of interest to evolutionary biology, are included, such as PAML, Muscle, MAFFT, MrBayes, and BLAST. Most of these software packages are maintained through the Debian Med project. In addition, BioNode contains convenient configuration scripts for parallelizing bioinformatics software. Where Debian Med encourages packaging free and open source bioinformatics software through one central project, BioNode encourages creating free and open source VM images, for multiple targets, through one central project. BioNode can be deployed on Windows, OSX, Linux, and in the Cloud. Next to the downloadable BioNode images, we provide tutorials online, which empower bioinformaticians to install and run BioNode in different environments, as well as information for future initiatives, on creating and building such images.

摘要

进化生物学中的基因组数据分析对计算能力的要求越来越高,以至于在单台台式计算机上分析多个假设和情景耗时过长。在本章中,我们在简要概述高级编程技术之后,将讨论通过计算并行化来扩展计算的技术。不幸的是,并行编程难度较大,需要特殊的软件设计。另一种选择,对遗留软件尤其有吸引力,是通过使用作业调度器将整个程序作为单独的进程并行运行来引入简易并行化。这样的管道通常部署在生物信息学计算机集群上。个人计算机虚拟化的最新进展使得在另一个操作系统之上的“盒子”或虚拟机(VM)中运行完整的计算机操作系统及其所有已安装软件成为可能。这样的虚拟机可以灵活地部署在本地网络中的多台计算机上,例如现有的台式个人计算机上,甚至可以部署在云端,以创建一个“虚拟”计算机集群。进化生物学中的许多生物信息学应用程序可以并行运行,在一个或多个虚拟机中运行进程。在这里,我们展示了一个名为BioNode的现成生物信息学虚拟机镜像如何通过几个步骤有效地创建一个计算集群和管道。这使研究人员能够根据需要,利用可用硬件从他们的台式计算机扩展计算能力。BioNode基于Debian Linux,可以在联网的个人计算机和云端运行。其中包含200多个进化生物学感兴趣的生物信息学和统计软件包,如PAML、Muscle、MAFFT、MrBayes和BLAST。这些软件包大多通过Debian Med项目维护。此外,BioNode包含用于并行化生物信息学软件的便捷配置脚本。Debian Med鼓励通过一个中央项目打包免费和开源的生物信息学软件,而BioNode则鼓励通过一个中央项目为多个目标创建免费和开源的虚拟机镜像。BioNode可以部署在Windows、OSX、Linux以及云端。除了可下载的BioNode镜像外,我们还在线提供教程,帮助生物信息学家在不同环境中安装和运行BioNode,以及提供有关创建和构建此类镜像的未来计划的信息。

相似文献

1
Scalable computing for evolutionary genomics.用于进化基因组学的可扩展计算
Methods Mol Biol. 2012;856:529-45. doi: 10.1007/978-1-61779-585-5_22.
2
Scalable Workflows and Reproducible Data Analysis for Genomics.基因组学的可扩展工作流程和可重复数据分析
Methods Mol Biol. 2019;1910:723-745. doi: 10.1007/978-1-4939-9074-0_24.
3
Sharing programming resources between Bio* projects through remote procedure call and native call stack strategies.通过远程过程调用和原生调用栈策略在生物相关项目之间共享编程资源。
Methods Mol Biol. 2012;856:513-27. doi: 10.1007/978-1-61779-585-5_21.
4
Constructing computational pipelines.构建计算管道。
Methods Mol Biol. 2008;453:451-70. doi: 10.1007/978-1-60327-429-6_24.
5
Virtualization and cloud computing in dentistry.牙科中的虚拟化与云计算
J Mass Dent Soc. 2014 Spring;63(1):14-7.
6
Providing Assistive Technology Applications as a Service Through Cloud Computing.通过云计算提供作为服务的辅助技术应用程序。
Assist Technol. 2015 Spring;27(1):44-51. doi: 10.1080/10400435.2014.963258.
7
Windows .NET Network Distributed Basic Local Alignment Search Toolkit (W.ND-BLAST).Windows .NET网络分布式基本局部比对搜索工具包(W.ND-BLAST)。
BMC Bioinformatics. 2005 Apr 8;6:93. doi: 10.1186/1471-2105-6-93.
8
PhyloGena--a user-friendly system for automated phylogenetic annotation of unknown sequences.PhyloGena——一个用于对未知序列进行自动系统发育注释的用户友好型系统。
Bioinformatics. 2007 Apr 1;23(7):793-801. doi: 10.1093/bioinformatics/btm016. Epub 2007 Mar 1.
9
Cluster computing for digital microscopy.用于数字显微镜的集群计算
Microsc Res Tech. 2004 Jun 1;64(2):204-13. doi: 10.1002/jemt.20065.
10
R/parallel--speeding up bioinformatics analysis with R.R/parallel——利用R加速生物信息学分析
BMC Bioinformatics. 2008 Sep 22;9:390. doi: 10.1186/1471-2105-9-390.

引用本文的文献

1
Parallel computing in genomic research: advances and applications.基因组研究中的并行计算:进展与应用
Adv Appl Bioinform Chem. 2015 Nov 13;8:23-35. doi: 10.2147/AABC.S64482. eCollection 2015.