• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

CloudDOE:一款用于部署Hadoop云并使用MapReduce分析高通量测序数据的用户友好型工具。

CloudDOE: a user-friendly tool for deploying Hadoop clouds and analyzing high-throughput sequencing data with MapReduce.

作者信息

Chung Wei-Chun, Chen Chien-Chih, Ho Jan-Ming, Lin Chung-Yen, Hsu Wen-Lian, Wang Yu-Chun, Lee D T, Lai Feipei, Huang Chih-Wei, Chang Yu-Jung

机构信息

Institute of Information Science, Academia Sinica, Taipei, Taiwan; Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan; Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan.

Institute of Information Science, Academia Sinica, Taipei, Taiwan; Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan.

出版信息

PLoS One. 2014 Jun 4;9(6):e98146. doi: 10.1371/journal.pone.0098146. eCollection 2014.

DOI:10.1371/journal.pone.0098146
PMID:24897343
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4045712/
Abstract

BACKGROUND

Explosive growth of next-generation sequencing data has resulted in ultra-large-scale data sets and ensuing computational problems. Cloud computing provides an on-demand and scalable environment for large-scale data analysis. Using a MapReduce framework, data and workload can be distributed via a network to computers in the cloud to substantially reduce computational latency. Hadoop/MapReduce has been successfully adopted in bioinformatics for genome assembly, mapping reads to genomes, and finding single nucleotide polymorphisms. Major cloud providers offer Hadoop cloud services to their users. However, it remains technically challenging to deploy a Hadoop cloud for those who prefer to run MapReduce programs in a cluster without built-in Hadoop/MapReduce.

RESULTS

We present CloudDOE, a platform-independent software package implemented in Java. CloudDOE encapsulates technical details behind a user-friendly graphical interface, thus liberating scientists from having to perform complicated operational procedures. Users are guided through the user interface to deploy a Hadoop cloud within in-house computing environments and to run applications specifically targeted for bioinformatics, including CloudBurst, CloudBrush, and CloudRS. One may also use CloudDOE on top of a public cloud. CloudDOE consists of three wizards, i.e., Deploy, Operate, and Extend wizards. Deploy wizard is designed to aid the system administrator to deploy a Hadoop cloud. It installs Java runtime environment version 1.6 and Hadoop version 0.20.203, and initiates the service automatically. Operate wizard allows the user to run a MapReduce application on the dashboard list. To extend the dashboard list, the administrator may install a new MapReduce application using Extend wizard.

CONCLUSIONS

CloudDOE is a user-friendly tool for deploying a Hadoop cloud. Its smart wizards substantially reduce the complexity and costs of deployment, execution, enhancement, and management. Interested users may collaborate to improve the source code of CloudDOE to further incorporate more MapReduce bioinformatics tools into CloudDOE and support next-generation big data open source tools, e.g., Hadoop BigTop and Spark.

AVAILABILITY

CloudDOE is distributed under Apache License 2.0 and is freely available at http://clouddoe.iis.sinica.edu.tw/.

摘要

背景

下一代测序数据的爆炸式增长产生了超大规模数据集以及随之而来的计算问题。云计算为大规模数据分析提供了按需且可扩展的环境。使用MapReduce框架,数据和工作负载可以通过网络分布到云中的计算机,从而大幅减少计算延迟。Hadoop/MapReduce已在生物信息学中成功应用于基因组组装、将 reads 映射到基因组以及寻找单核苷酸多态性。主要的云提供商向其用户提供Hadoop云服务。然而,对于那些希望在没有内置Hadoop/MapReduce的集群中运行MapReduce程序的人来说,部署Hadoop云在技术上仍然具有挑战性。

结果

我们展示了CloudDOE,这是一个用Java实现的与平台无关的软件包。CloudDOE在用户友好的图形界面背后封装了技术细节,从而使科学家无需执行复杂的操作程序。通过用户界面引导用户在内部计算环境中部署Hadoop云,并运行专门针对生物信息学的应用程序,包括CloudBurst、CloudBrush和CloudRS。也可以在公共云之上使用CloudDOE。CloudDOE由三个向导组成,即部署向导、操作向导和扩展向导。部署向导旨在帮助系统管理员部署Hadoop云。它安装Java运行时环境1.6版和Hadoop 0.20.203版,并自动启动服务。操作向导允许用户在仪表板列表上运行MapReduce应用程序。为了扩展仪表板列表,管理员可以使用扩展向导安装新的MapReduce应用程序。

结论

CloudDOE是用于部署Hadoop云的用户友好工具。其智能向导大大降低了部署、执行、增强和管理的复杂性和成本。感兴趣的用户可以合作改进CloudDOE的源代码,以进一步将更多MapReduce生物信息学工具纳入CloudDOE,并支持下一代大数据开源工具,例如Hadoop BigTop和Spark。

可用性

CloudDOE根据Apache许可证2.0分发,可在http://clouddoe.iis.sinica.edu.tw/免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/594b/4045712/4ae8d8cfc347/pone.0098146.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/594b/4045712/c20ce069b96e/pone.0098146.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/594b/4045712/ea220d40e9f0/pone.0098146.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/594b/4045712/b3f128dd1cef/pone.0098146.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/594b/4045712/b9c85052df40/pone.0098146.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/594b/4045712/e2fa26a1319a/pone.0098146.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/594b/4045712/4ae8d8cfc347/pone.0098146.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/594b/4045712/c20ce069b96e/pone.0098146.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/594b/4045712/ea220d40e9f0/pone.0098146.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/594b/4045712/b3f128dd1cef/pone.0098146.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/594b/4045712/b9c85052df40/pone.0098146.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/594b/4045712/e2fa26a1319a/pone.0098146.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/594b/4045712/4ae8d8cfc347/pone.0098146.g006.jpg

相似文献

1
CloudDOE: a user-friendly tool for deploying Hadoop clouds and analyzing high-throughput sequencing data with MapReduce.CloudDOE:一款用于部署Hadoop云并使用MapReduce分析高通量测序数据的用户友好型工具。
PLoS One. 2014 Jun 4;9(6):e98146. doi: 10.1371/journal.pone.0098146. eCollection 2014.
2
An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics.Hadoop/MapReduce/HBase 框架概述及其在生物信息学中的当前应用。
BMC Bioinformatics. 2010 Dec 21;11 Suppl 12(Suppl 12):S1. doi: 10.1186/1471-2105-11-S12-S1.
3
Cloudgene: a graphical execution platform for MapReduce programs on private and public clouds.Cloudgene:一个在私有云和公有云上运行 MapReduce 程序的图形化执行平台。
BMC Bioinformatics. 2012 Aug 13;13:200. doi: 10.1186/1471-2105-13-200.
4
MarDRe: efficient MapReduce-based removal of duplicate DNA reads in the cloud.MarDRe:基于MapReduce在云端高效去除重复DNA读数。
Bioinformatics. 2017 Sep 1;33(17):2762-2764. doi: 10.1093/bioinformatics/btx307.
5
Eoulsan: a cloud computing-based framework facilitating high throughput sequencing analyses.Eoulsan:一个基于云计算的框架,可实现高通量测序分析。
Bioinformatics. 2012 Jun 1;28(11):1542-3. doi: 10.1093/bioinformatics/bts165. Epub 2012 Apr 5.
6
A quantitative assessment of the Hadoop framework for analyzing massively parallel DNA sequencing data.用于分析大规模并行DNA测序数据的Hadoop框架的定量评估。
Gigascience. 2015 Jun 4;4:26. doi: 10.1186/s13742-015-0058-5. eCollection 2015.
7
Hadoop-BAM: directly manipulating next generation sequencing data in the cloud.Hadoop-BAM:在云中直接操作下一代测序数据。
Bioinformatics. 2012 Mar 15;28(6):876-7. doi: 10.1093/bioinformatics/bts054. Epub 2012 Feb 2.
8
cl-dash: rapid configuration and deployment of Hadoop clusters for bioinformatics research in the cloud.CL-Dash:用于云环境中生物信息学研究的Hadoop集群的快速配置与部署
Bioinformatics. 2016 Jan 15;32(2):301-3. doi: 10.1093/bioinformatics/btv553. Epub 2015 Oct 1.
9
Survey of MapReduce frame operation in bioinformatics.生物信息学中MapReduce框架操作的调查。
Brief Bioinform. 2014 Jul;15(4):637-47. doi: 10.1093/bib/bbs088. Epub 2013 Feb 7.
10
CloudAligner: A fast and full-featured MapReduce based tool for sequence mapping.CloudAligner:一种基于MapReduce的快速且功能齐全的序列映射工具。
BMC Res Notes. 2011 Jun 6;4:171. doi: 10.1186/1756-0500-4-171.

引用本文的文献

1
Scalability and cost-effectiveness analysis of whole genome-wide association studies on Google Cloud Platform and Amazon Web Services.全基因组关联研究在谷歌云平台和亚马逊网络服务上的可扩展性和成本效益分析。
J Am Med Inform Assoc. 2020 Sep 1;27(9):1425-1430. doi: 10.1093/jamia/ocaa068.
2
HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing.HAlign-II:利用分布式和并行计算实现高效的超大倍数序列比对及系统发育树重建
Algorithms Mol Biol. 2017 Sep 29;12:25. doi: 10.1186/s13015-017-0116-x. eCollection 2017.
3
Big Data Application in Biomedical Research and Health Care: A Literature Review.

本文引用的文献

1
SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop.SeqPig:Hadoop 中用于大型测序数据集的简单且可扩展的脚本编制。
Bioinformatics. 2014 Jan 1;30(1):119-20. doi: 10.1093/bioinformatics/btt601. Epub 2013 Oct 22.
2
BioPig: a Hadoop-based analytic toolkit for large-scale sequence data.BioPig:一个基于 Hadoop 的大规模序列数据分析工具包。
Bioinformatics. 2013 Dec 1;29(23):3014-9. doi: 10.1093/bioinformatics/btt528. Epub 2013 Sep 10.
3
Survey of MapReduce frame operation in bioinformatics.生物信息学中MapReduce框架操作的调查。
大数据在生物医学研究与医疗保健中的应用:文献综述
Biomed Inform Insights. 2016 Jan 19;8:1-10. doi: 10.4137/BII.S31559. eCollection 2016.
4
Subset selection of high-depth next generation sequencing reads for de novo genome assembly using MapReduce framework.使用MapReduce框架进行从头基因组组装时对高深度下一代测序读数的子集选择。
BMC Genomics. 2015;16 Suppl 12(Suppl 12):S9. doi: 10.1186/1471-2164-16-S12-S9. Epub 2015 Dec 9.
5
cl-dash: rapid configuration and deployment of Hadoop clusters for bioinformatics research in the cloud.CL-Dash:用于云环境中生物信息学研究的Hadoop集群的快速配置与部署
Bioinformatics. 2016 Jan 15;32(2):301-3. doi: 10.1093/bioinformatics/btv553. Epub 2015 Oct 1.
Brief Bioinform. 2014 Jul;15(4):637-47. doi: 10.1093/bib/bbs088. Epub 2013 Feb 7.
4
A de novo next generation genomic sequence assembler based on string graph and MapReduce cloud computing framework.基于字符串图和 MapReduce 云计算框架的从头开始的新一代基因组序列组装器。
BMC Genomics. 2012;13 Suppl 7(Suppl 7):S28. doi: 10.1186/1471-2164-13-S7-S28. Epub 2012 Dec 13.
5
Eoulsan: a cloud computing-based framework facilitating high throughput sequencing analyses.Eoulsan:一个基于云计算的框架,可实现高通量测序分析。
Bioinformatics. 2012 Jun 1;28(11):1542-3. doi: 10.1093/bioinformatics/bts165. Epub 2012 Apr 5.
6
Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community.云生物 Linux:为基因组学社区提供预配置和按需生物信息学计算。
BMC Bioinformatics. 2012 Mar 19;13:42. doi: 10.1186/1471-2105-13-42.
7
An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics.Hadoop/MapReduce/HBase 框架概述及其在生物信息学中的当前应用。
BMC Bioinformatics. 2010 Dec 21;11 Suppl 12(Suppl 12):S1. doi: 10.1186/1471-2105-11-S12-S1.
8
Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences.Galaxy:一种支持生命科学领域可访问、可重现和透明计算研究的综合方法。
Genome Biol. 2010;11(8):R86. doi: 10.1186/gb-2010-11-8-r86. Epub 2010 Aug 25.
9
Cloud-scale RNA-sequencing differential expression analysis with Myrna.利用 Myrna 进行云规模 RNA-seq 差异表达分析。
Genome Biol. 2010;11(8):R83. doi: 10.1186/gb-2010-11-8-r83. Epub 2010 Aug 11.
10
Cloud computing and the DNA data race.云计算与DNA数据竞赛。
Nat Biotechnol. 2010 Jul;28(7):691-3. doi: 10.1038/nbt0710-691.