CloudDOE:一款用于部署Hadoop云并使用MapReduce分析高通量测序数据的用户友好型工具。

CloudDOE: a user-friendly tool for deploying Hadoop clouds and analyzing high-throughput sequencing data with MapReduce.

作者信息

Chung Wei-Chun, Chen Chien-Chih, Ho Jan-Ming, Lin Chung-Yen, Hsu Wen-Lian, Wang Yu-Chun, Lee D T, Lai Feipei, Huang Chih-Wei, Chang Yu-Jung

机构信息

Institute of Information Science, Academia Sinica, Taipei, Taiwan; Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan; Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan.

Institute of Information Science, Academia Sinica, Taipei, Taiwan; Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan.

出版信息

PLoS One. 2014 Jun 4;9(6):e98146. doi: 10.1371/journal.pone.0098146. eCollection 2014.

Abstract

BACKGROUND

Explosive growth of next-generation sequencing data has resulted in ultra-large-scale data sets and ensuing computational problems. Cloud computing provides an on-demand and scalable environment for large-scale data analysis. Using a MapReduce framework, data and workload can be distributed via a network to computers in the cloud to substantially reduce computational latency. Hadoop/MapReduce has been successfully adopted in bioinformatics for genome assembly, mapping reads to genomes, and finding single nucleotide polymorphisms. Major cloud providers offer Hadoop cloud services to their users. However, it remains technically challenging to deploy a Hadoop cloud for those who prefer to run MapReduce programs in a cluster without built-in Hadoop/MapReduce.

RESULTS

We present CloudDOE, a platform-independent software package implemented in Java. CloudDOE encapsulates technical details behind a user-friendly graphical interface, thus liberating scientists from having to perform complicated operational procedures. Users are guided through the user interface to deploy a Hadoop cloud within in-house computing environments and to run applications specifically targeted for bioinformatics, including CloudBurst, CloudBrush, and CloudRS. One may also use CloudDOE on top of a public cloud. CloudDOE consists of three wizards, i.e., Deploy, Operate, and Extend wizards. Deploy wizard is designed to aid the system administrator to deploy a Hadoop cloud. It installs Java runtime environment version 1.6 and Hadoop version 0.20.203, and initiates the service automatically. Operate wizard allows the user to run a MapReduce application on the dashboard list. To extend the dashboard list, the administrator may install a new MapReduce application using Extend wizard.

CONCLUSIONS

CloudDOE is a user-friendly tool for deploying a Hadoop cloud. Its smart wizards substantially reduce the complexity and costs of deployment, execution, enhancement, and management. Interested users may collaborate to improve the source code of CloudDOE to further incorporate more MapReduce bioinformatics tools into CloudDOE and support next-generation big data open source tools, e.g., Hadoop BigTop and Spark.

AVAILABILITY

CloudDOE is distributed under Apache License 2.0 and is freely available at http://clouddoe.iis.sinica.edu.tw/.

摘要

背景

下一代测序数据的爆炸式增长产生了超大规模数据集以及随之而来的计算问题。云计算为大规模数据分析提供了按需且可扩展的环境。使用MapReduce框架,数据和工作负载可以通过网络分布到云中的计算机,从而大幅减少计算延迟。Hadoop/MapReduce已在生物信息学中成功应用于基因组组装、将 reads 映射到基因组以及寻找单核苷酸多态性。主要的云提供商向其用户提供Hadoop云服务。然而,对于那些希望在没有内置Hadoop/MapReduce的集群中运行MapReduce程序的人来说,部署Hadoop云在技术上仍然具有挑战性。

结果

我们展示了CloudDOE,这是一个用Java实现的与平台无关的软件包。CloudDOE在用户友好的图形界面背后封装了技术细节,从而使科学家无需执行复杂的操作程序。通过用户界面引导用户在内部计算环境中部署Hadoop云,并运行专门针对生物信息学的应用程序,包括CloudBurst、CloudBrush和CloudRS。也可以在公共云之上使用CloudDOE。CloudDOE由三个向导组成,即部署向导、操作向导和扩展向导。部署向导旨在帮助系统管理员部署Hadoop云。它安装Java运行时环境1.6版和Hadoop 0.20.203版,并自动启动服务。操作向导允许用户在仪表板列表上运行MapReduce应用程序。为了扩展仪表板列表,管理员可以使用扩展向导安装新的MapReduce应用程序。

结论

CloudDOE是用于部署Hadoop云的用户友好工具。其智能向导大大降低了部署、执行、增强和管理的复杂性和成本。感兴趣的用户可以合作改进CloudDOE的源代码,以进一步将更多MapReduce生物信息学工具纳入CloudDOE,并支持下一代大数据开源工具,例如Hadoop BigTop和Spark。

可用性

CloudDOE根据Apache许可证2.0分发,可在http://clouddoe.iis.sinica.edu.tw/免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/594b/4045712/c20ce069b96e/pone.0098146.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索