生物信息学中MapReduce框架操作的调查。

Survey of MapReduce frame operation in bioinformatics.

作者信息

Zou Quan, Li Xu-Bin, Jiang Wen-Rui, Lin Zi-Yu, Li Gui-Lin, Chen Ke

出版信息

Brief Bioinform. 2014 Jul;15(4):637-47. doi: 10.1093/bib/bbs088. Epub 2013 Feb 7.

Abstract

Bioinformatics is challenged by the fact that traditional analysis tools have difficulty in processing large-scale data from high-throughput sequencing. The open source Apache Hadoop project, which adopts the MapReduce framework and a distributed file system, has recently given bioinformatics researchers an opportunity to achieve scalable, efficient and reliable computing performance on Linux clusters and on cloud computing services. In this article, we present MapReduce frame-based applications that can be employed in the next-generation sequencing and other biological domains. In addition, we discuss the challenges faced by this field as well as the future works on parallel computing in bioinformatics.

摘要

传统分析工具在处理来自高通量测序的大规模数据时存在困难，这给生物信息学带来了挑战。开源的Apache Hadoop项目采用MapReduce框架和分布式文件系统，最近为生物信息学研究人员提供了一个机会，使其能够在Linux集群和云计算服务上实现可扩展、高效且可靠的计算性能。在本文中，我们展示了基于MapReduce框架的应用程序，这些应用程序可用于下一代测序及其他生物领域。此外，我们还讨论了该领域面临的挑战以及生物信息学中并行计算的未来工作。

相似文献

Survey of MapReduce frame operation in bioinformatics.生物信息学中MapReduce框架操作的调查。

Brief Bioinform. 2014 Jul;15(4):637-47. doi: 10.1093/bib/bbs088. Epub 2013 Feb 7.

CloudDOE: a user-friendly tool for deploying Hadoop clouds and analyzing high-throughput sequencing data with MapReduce.CloudDOE：一款用于部署Hadoop云并使用MapReduce分析高通量测序数据的用户友好型工具。

PLoS One. 2014 Jun 4;9(6):e98146. doi: 10.1371/journal.pone.0098146. eCollection 2014.

An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics.Hadoop/MapReduce/HBase 框架概述及其在生物信息学中的当前应用。

BMC Bioinformatics. 2010 Dec 21;11 Suppl 12(Suppl 12):S1. doi: 10.1186/1471-2105-11-S12-S1.

Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends.MapReduce 编程框架在临床大数据分析中的应用：现状与未来趋势。

BioData Min. 2014 Oct 29;7:22. doi: 10.1186/1756-0381-7-22. eCollection 2014.

A quantitative assessment of the Hadoop framework for analyzing massively parallel DNA sequencing data.用于分析大规模并行DNA测序数据的Hadoop框架的定量评估。

Gigascience. 2015 Jun 4;4:26. doi: 10.1186/s13742-015-0058-5. eCollection 2015.

cl-dash: rapid configuration and deployment of Hadoop clusters for bioinformatics research in the cloud.CL-Dash：用于云环境中生物信息学研究的Hadoop集群的快速配置与部署

Bioinformatics. 2016 Jan 15;32(2):301-3. doi: 10.1093/bioinformatics/btv553. Epub 2015 Oct 1.

Hybrid cloud and cluster computing paradigms for life science applications.生命科学应用的混合云与集群计算范式。

BMC Bioinformatics. 2010 Dec 21;11 Suppl 12(Suppl 12):S3. doi: 10.1186/1471-2105-11-S12-S3.

Parallel MapReduce: Maximizing Cloud Resource Utilization and Performance Improvement Using Parallel Execution Strategies.并行 MapReduce：利用并行执行策略最大化云资源利用率和提升性能。

Biomed Res Int. 2018 Oct 17;2018:7501042. doi: 10.1155/2018/7501042. eCollection 2018.

Bioinformatics applications on Apache Spark.基于 Apache Spark 的生物信息学应用。

Gigascience. 2018 Aug 1;7(8):giy098. doi: 10.1093/gigascience/giy098.

STDADS: An Efficient Slow Task Detection Algorithm for Deadline Schedulers.STDADS：一种用于截止期调度器的高效慢速任务检测算法。

Big Data. 2020 Feb;8(1):62-69. doi: 10.1089/big.2019.0039. Epub 2020 Jan 29.

引用本文的文献

DNAPred_Prot: Identification of DNA-Binding Proteins Using Composition- and Position-Based Features.DNAPred_Prot：利用基于组成和位置的特征识别DNA结合蛋白。

Appl Bionics Biomech. 2022 Apr 13;2022:5483115. doi: 10.1155/2022/5483115. eCollection 2022.

RGMQL: scalable and interoperable computing of heterogeneous omics big data and metadata in R/Bioconductor.RGMQL：在 R/Bioconductor 中可扩展和互操作的异构组学大数据和元数据的计算。

BMC Bioinformatics. 2022 Apr 7;23(1):123. doi: 10.1186/s12859-022-04648-4.

BigFiRSt: A Software Program Using Big Data Technique for Mining Simple Sequence Repeats From Large-Scale Sequencing Data.BigFiRSt：一种使用大数据技术从大规模测序数据中挖掘简单序列重复序列的软件程序。

Front Big Data. 2022 Jan 18;4:727216. doi: 10.3389/fdata.2021.727216. eCollection 2021.

Cloud Computing Enabled Big Multi-Omics Data Analytics.基于云计算的大型多组学数据分析

Bioinform Biol Insights. 2021 Jul 28;15:11779322211035921. doi: 10.1177/11779322211035921. eCollection 2021.

A Genocentric Approach to Discovery of Mendelian Disorders.从种族中心主义角度探究孟德尔遗传病

Am J Hum Genet. 2019 Nov 7;105(5):974-986. doi: 10.1016/j.ajhg.2019.09.027. Epub 2019 Oct 24.

Perspectives of Bioinformatics in Big Data Era.大数据时代的生物信息学展望

Curr Genomics. 2019 Feb;20(2):79-80.

Gradient Boosting Decision Tree-Based Method for Predicting Interactions Between Target Genes and Drugs.基于梯度提升决策树的靶基因与药物相互作用预测方法

Front Genet. 2019 May 31;10:459. doi: 10.3389/fgene.2019.00459. eCollection 2019.

Identification of Phage Viral Proteins With Hybrid Sequence Features.具有杂交序列特征的噬菌体病毒蛋白的鉴定

Front Microbiol. 2019 Mar 26;10:507. doi: 10.3389/fmicb.2019.00507. eCollection 2019.

Inferring Bacterial Infiltration in Primary Colorectal Tumors From Host Whole Genome Sequencing Data.从宿主全基因组测序数据推断原发性结直肠癌肿瘤中的细菌浸润情况。

Front Genet. 2019 Mar 15;10:213. doi: 10.3389/fgene.2019.00213. eCollection 2019.

Machine Learning and Integrative Analysis of Biomedical Big Data.机器学习与生物医学大数据的综合分析。

Genes (Basel). 2019 Jan 28;10(2):87. doi: 10.3390/genes10020087.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

生物信息学中MapReduce框架操作的调查。

Survey of MapReduce frame operation in bioinformatics.

作者信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献