Hodor Paul, Chawla Amandeep, Clark Andrew, Neal Lauren
Booz Allen Hamilton, Rockville, MD 20852, USA.
Bioinformatics. 2016 Jan 15;32(2):301-3. doi: 10.1093/bioinformatics/btv553. Epub 2015 Oct 1.
: One of the solutions proposed for addressing the challenge of the overwhelming abundance of genomic sequence and other biological data is the use of the Hadoop computing framework. Appropriate tools are needed to set up computational environments that facilitate research of novel bioinformatics methodology using Hadoop. Here, we present cl-dash, a complete starter kit for setting up such an environment. Configuring and deploying new Hadoop clusters can be done in minutes. Use of Amazon Web Services ensures no initial investment and minimal operation costs. Two sample bioinformatics applications help the researcher understand and learn the principles of implementing an algorithm using the MapReduce programming pattern.
Source code is available at https://bitbucket.org/booz-allen-sci-comp-team/cl-dash.git.
为应对基因组序列和其他生物数据海量丰富带来的挑战而提出的解决方案之一是使用Hadoop计算框架。需要合适的工具来搭建便于利用Hadoop研究新型生物信息学方法的计算环境。在此,我们展示了cl-dash,这是一个用于搭建此类环境的完整入门套件。配置和部署新的Hadoop集群只需几分钟。使用亚马逊网络服务可确保无需初始投资且运营成本最低。两个生物信息学示例应用程序可帮助研究人员理解和学习使用MapReduce编程模式实现算法的原理。
源代码可在https://bitbucket.org/booz-allen-sci-comp-team/cl-dash.git获取。