Bhuvaneshwar Krithika, Sulakhe Dinanath, Gauba Robinder, Rodriguez Alex, Madduri Ravi, Dave Utpal, Lacinski Lukasz, Foster Ian, Gusev Yuriy, Madhavan Subha
Innovation Center for Biomedical Informatics (ICBI), Georgetown University, Washington, DC 20007, USA.
Computation Institute, University of Chicago, Argonne National Laboratory, 60637, USA; Globus Genomics, USA.
Comput Struct Biotechnol J. 2014 Nov 7;13:64-74. doi: 10.1016/j.csbj.2014.11.001. eCollection 2015.
Next generation sequencing (NGS) technologies produce massive amounts of data requiring a powerful computational infrastructure, high quality bioinformatics software, and skilled personnel to operate the tools. We present a case study of a practical solution to this data management and analysis challenge that simplifies terabyte scale data handling and provides advanced tools for NGS data analysis. These capabilities are implemented using the "Globus Genomics" system, which is an enhanced Galaxy workflow system made available as a service that offers users the capability to process and transfer data easily, reliably and quickly to address end-to-endNGS analysis requirements. The Globus Genomics system is built on Amazon 's cloud computing infrastructure. The system takes advantage of elastic scaling of compute resources to run multiple workflows in parallel and it also helps meet the scale-out analysis needs of modern translational genomics research.
下一代测序(NGS)技术会产生海量数据,这需要强大的计算基础设施、高质量的生物信息学软件以及熟练的操作人员来运行这些工具。我们展示了一个针对这一数据管理和分析挑战的实际解决方案的案例研究,该方案简化了TB级数据处理,并为NGS数据分析提供了先进工具。这些功能是通过“Globus基因组学”系统实现的,它是一个增强版的Galaxy工作流系统,作为一项服务提供,使用户能够轻松、可靠且快速地处理和传输数据,以满足端到端的NGS分析需求。Globus基因组学系统构建在亚马逊的云计算基础设施之上。该系统利用计算资源的弹性扩展来并行运行多个工作流,还有助于满足现代转化基因组学研究的横向扩展分析需求。