基因组学虚拟实验室：面向云端的实用生物信息学工作台。

Genomics Virtual Laboratory: A Practical Bioinformatics Workbench for the Cloud.

作者信息

Afgan Enis, Sloggett Clare, Goonasekera Nuwan, Makunin Igor, Benson Derek, Crowe Mark, Gladman Simon, Kowsar Yousef, Pheasant Michael, Horst Ron, Lonie Andrew

机构信息

Victorian Life Sciences Computation Initiative (VLSCI), University of Melbourne, Melbourne, Victoria, Australia; Department of Biology, Johns Hopkins University, Baltimore, Maryland, United States of America; Centre for Computing and Informatics (CIR), Rudjer Boskovic Institute (RBI), Zagreb, Croatia.

Victorian Life Sciences Computation Initiative (VLSCI), University of Melbourne, Melbourne, Victoria, Australia.

出版信息

PLoS One. 2015 Oct 26;10(10):e0140829. doi: 10.1371/journal.pone.0140829. eCollection 2015.

DOI:10.1371/journal.pone.0140829

PMID:26501966

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4621043/

Abstract

BACKGROUND

Analyzing high throughput genomics data is a complex and compute intensive task, generally requiring numerous software tools and large reference data sets, tied together in successive stages of data transformation and visualisation. A computational platform enabling best practice genomics analysis ideally meets a number of requirements, including: a wide range of analysis and visualisation tools, closely linked to large user and reference data sets; workflow platform(s) enabling accessible, reproducible, portable analyses, through a flexible set of interfaces; highly available, scalable computational resources; and flexibility and versatility in the use of these resources to meet demands and expertise of a variety of users. Access to an appropriate computational platform can be a significant barrier to researchers, as establishing such a platform requires a large upfront investment in hardware, experience, and expertise.

RESULTS

We designed and implemented the Genomics Virtual Laboratory (GVL) as a middleware layer of machine images, cloud management tools, and online services that enable researchers to build arbitrarily sized compute clusters on demand, pre-populated with fully configured bioinformatics tools, reference datasets and workflow and visualisation options. The platform is flexible in that users can conduct analyses through web-based (Galaxy, RStudio, IPython Notebook) or command-line interfaces, and add/remove compute nodes and data resources as required. Best-practice tutorials and protocols provide a path from introductory training to practice. The GVL is available on the OpenStack-based Australian Research Cloud (http://nectar.org.au) and the Amazon Web Services cloud. The principles, implementation and build process are designed to be cloud-agnostic.

CONCLUSIONS

This paper provides a blueprint for the design and implementation of a cloud-based Genomics Virtual Laboratory. We discuss scope, design considerations and technical and logistical constraints, and explore the value added to the research community through the suite of services and resources provided by our implementation.

摘要

背景

分析高通量基因组学数据是一项复杂且计算密集型的任务，通常需要众多软件工具和大型参考数据集，并在数据转换和可视化的连续阶段中将它们结合在一起。一个能够实现最佳实践基因组学分析的计算平台理想情况下应满足多项要求，包括：与大量用户和参考数据集紧密相连的广泛分析和可视化工具；通过一组灵活的接口实现可访问、可重复、可移植分析的工作流平台；高可用性、可扩展的计算资源；以及在使用这些资源时的灵活性和通用性，以满足不同用户的需求和专业知识。对于研究人员来说，使用合适的计算平台可能是一个重大障碍，因为建立这样一个平台需要在硬件、经验和专业知识方面进行大量前期投资。

结果

我们设计并实现了基因组学虚拟实验室（GVL），作为机器镜像、云管理工具和在线服务的中间件层，使研究人员能够按需构建任意规模的计算集群，并预先配置好完整的生物信息学工具、参考数据集以及工作流和可视化选项。该平台具有灵活性，用户可以通过基于网络的（Galaxy、RStudio、IPython Notebook）或命令行界面进行分析，并根据需要添加/删除计算节点和数据资源。最佳实践教程和协议提供了从入门培训到实践的途径。GVL可在基于OpenStack的澳大利亚研究云（http://nectar.org.au）和亚马逊网络服务云上使用。其原理、实现和构建过程设计为与云无关。

结论

本文为基于云的基因组学虚拟实验室的设计和实现提供了蓝图。我们讨论了范围、设计考虑因素以及技术和后勤限制，并通过我们实现的一系列服务和资源探索了为研究界带来的附加值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0a9/4621043/b93164dd9893/pone.0140829.g001.jpg

相似文献

Genomics Virtual Laboratory: A Practical Bioinformatics Workbench for the Cloud.基因组学虚拟实验室：面向云端的实用生物信息学工作台。

PLoS One. 2015 Oct 26;10(10):e0140829. doi: 10.1371/journal.pone.0140829. eCollection 2015.

VDJServer: A Cloud-Based Analysis Portal and Data Commons for Immune Repertoire Sequences and Rearrangements.VDJServer：一个基于云的免疫受体序列和重排分析门户和数据公共库。

Front Immunol. 2018 May 8;9:976. doi: 10.3389/fimmu.2018.00976. eCollection 2018.

Laniakea: an open solution to provide Galaxy "on-demand" instances over heterogeneous cloud infrastructures.拉尼亚凯亚超星系团：一种提供 Galaxy“按需”实例的开放式解决方案，可在异构云基础架构上使用。

Gigascience. 2020 Apr 1;9(4). doi: 10.1093/gigascience/giaa033.

Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community.云生物 Linux：为基因组学社区提供预配置和按需生物信息学计算。

BMC Bioinformatics. 2012 Mar 19;13:42. doi: 10.1186/1471-2105-13-42.

Laniakea@ReCaS: exploring the potential of customisable Galaxy on-demand instances as a cloud-based service.拉尼亚凯亚@ReCaS：探索可定制 Galaxy 按需实例作为云服务的潜力。

BMC Bioinformatics. 2021 Nov 8;22(Suppl 15):544. doi: 10.1186/s12859-021-04401-3.

Cloud bursting galaxy: federated identity and access management.云爆发星系：联合身份与访问管理。

Bioinformatics. 2020 Jan 1;36(1):1-9. doi: 10.1093/bioinformatics/btz472.

Cloud-based introduction to BASH programming for biologists.基于云的生物学 BASH 编程入门。

Brief Bioinform. 2024 Jul 23;25(Supplement_1). doi: 10.1093/bib/bbae244.

Performance and scaling behavior of bioinformatic applications in virtualization environments to create awareness for the efficient use of compute resources.在虚拟化环境中创建生物信息学应用程序的性能和扩展行为，以提高对有效利用计算资源的认识。

PLoS Comput Biol. 2021 Jul 20;17(7):e1009244. doi: 10.1371/journal.pcbi.1009244. eCollection 2021 Jul.

MC-GenomeKey: a multicloud system for the detection and annotation of genomic variants.MC-GenomeKey：一种用于检测和注释基因组变异的多云系统。

BMC Bioinformatics. 2017 Jan 20;18(1):49. doi: 10.1186/s12859-016-1454-2.

Galaxy CloudMan: delivering cloud compute clusters.星系云人：提供云计算集群。

BMC Bioinformatics. 2010 Dec 21;11 Suppl 12(Suppl 12):S4. doi: 10.1186/1471-2105-11-S12-S4.

引用本文的文献

AAVone: A cost-effective, single-plasmid solution for efficient AAV production with reduced DNA impurities.AAVone：一种经济高效的单质粒解决方案，用于高效生产腺相关病毒（AAV），并减少DNA杂质。

Mol Ther Nucleic Acids. 2025 May 14;36(2):102563. doi: 10.1016/j.omtn.2025.102563. eCollection 2025 Jun 10.

Deconvoluting TCR-dependent and -independent activation is vital for reliable Ag-specific CD4 T cell characterization by AIM assay.区分TCR依赖性和非依赖性激活对于通过AIM分析可靠地表征抗原特异性CD4 T细胞至关重要。

Sci Adv. 2025 Apr 25;11(17):eadv3491. doi: 10.1126/sciadv.adv3491.

Automated management of AWS instances for training.用于训练的AWS实例的自动化管理。

GigaByte. 2024 Aug 29;2024:gigabyte133. doi: 10.46471/gigabyte.133. eCollection 2024.

The AAV2.7m8 capsid packages a higher degree of heterogeneous vector genomes than AAV2.与 AAV2 相比，AAV2.7m8 衣壳包装了更高程度的异质性载体基因组。

Gene Ther. 2024 Sep;31(9-10):489-498. doi: 10.1038/s41434-024-00477-7. Epub 2024 Aug 12.

The venom and telopodal defence systems of the centipede Lithobius forficatus are functionally convergent serial homologues.蜈蚣石蜈蚣的毒液和附肢防御系统在功能上是趋同的系列同源物。

BMC Biol. 2024 Jun 13;22(1):135. doi: 10.1186/s12915-024-01925-x.

BTR: a bioinformatics tool recommendation system.BTR：一个生物信息学工具推荐系统。

Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae275.

Short-wavelength-sensitive 1 () opsin gene duplications and parallel visual pigment tuning support ultraviolet communication in damselfishes (Pomacentridae).短波敏感1（）视蛋白基因重复和平行视觉色素调谐支持雀鲷科鱼类的紫外线通讯。

Ecol Evol. 2024 Apr 16;14(4):e11186. doi: 10.1002/ece3.11186. eCollection 2024 Apr.

Producing high-quantity and high-quality recombinant adeno-associated virus by low-cis triple transfection.通过低顺式三重转染生产高产量和高质量的重组腺相关病毒。

Mol Ther Methods Clin Dev. 2024 Mar 12;32(2):101230. doi: 10.1016/j.omtm.2024.101230. eCollection 2024 Jun 13.

Sex-biased gene and microRNA expression in the developing mouse brain is associated with neurodevelopmental functions and neurological phenotypes.性别偏倚的基因和 microRNA 在发育中的小鼠大脑中的表达与神经发育功能和神经表型有关。

Biol Sex Differ. 2023 Sep 7;14(1):57. doi: 10.1186/s13293-023-00538-3.

The Molecular Mechanisms Employed by the Parasite (Cnidaria: Myxozoa) from Invasion through Sporulation for Successful Proliferation in Its Fish Host.寄生虫（刺胞动物门：粘孢子虫）在鱼类宿主中成功增殖的入侵、孢子形成及分子机制。

Int J Mol Sci. 2023 Aug 15;24(16):12824. doi: 10.3390/ijms241612824.

本文引用的文献

A reference model for deploying applications in virtualized environments.一种在虚拟化环境中部署应用程序的参考模型。

Concurr Comput. 2012 Aug 25;24(12):1349-1361. doi: 10.1002/cpe.1836. Epub 2011 Aug 26.

Bioinformatics and Microarray Data Analysis on the Cloud.云端生物信息学与微阵列数据分析

Methods Mol Biol. 2016;1375:25-39. doi: 10.1007/7651_2015_236.

Dissemination of scientific software with Galaxy ToolShed.通过Galaxy工具库传播科学软件。

Genome Biol. 2014 Feb 20;15(2):403. doi: 10.1186/gb4161.

The DNA Data Deluge: Fast, efficient genome sequencing machines are spewing out more data than geneticists can analyze.DNA数据泛滥：快速、高效的基因组测序机器产生的数据超出了遗传学家的分析能力。

IEEE Spectr. 2013 Jul;50(7):26-33. doi: 10.1109/MSPEC.2013.6545119.

Wrangling Galaxy's reference data.整理星系的参考数据。

Bioinformatics. 2014 Jul 1;30(13):1917-9. doi: 10.1093/bioinformatics/btu119. Epub 2014 Feb 28.

Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses.用于大规模下一代测序分析的基于云的生物信息学工作流程平台。

J Biomed Inform. 2014 Jun;49:119-33. doi: 10.1016/j.jbi.2014.01.005. Epub 2014 Jan 22.

Comparison of RNA-Seq and microarray in transcriptome profiling of activated T cells.RNA-Seq 和微阵列在激活 T 细胞转录组谱分析中的比较。

PLoS One. 2014 Jan 16;9(1):e78644. doi: 10.1371/journal.pone.0078644. eCollection 2014.

Ten simple rules for reproducible computational research.可重复计算研究的十条简单规则。

PLoS Comput Biol. 2013 Oct;9(10):e1003285. doi: 10.1371/journal.pcbi.1003285. Epub 2013 Oct 24.

Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data.RNA测序数据差异基因表达分析方法的综合评估

Genome Biol. 2013;14(9):R95. doi: 10.1186/gb-2013-14-9-r95.

Computational solutions for omics data.计算方法在组学数据中的应用。

Nat Rev Genet. 2013 May;14(5):333-46. doi: 10.1038/nrg3433.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基因组学虚拟实验室：面向云端的实用生物信息学工作台。

Genomics Virtual Laboratory: A Practical Bioinformatics Workbench for the Cloud.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献