Zou Quan, Li Xu-Bin, Jiang Wen-Rui, Lin Zi-Yu, Li Gui-Lin, Chen Ke
Brief Bioinform. 2014 Jul;15(4):637-47. doi: 10.1093/bib/bbs088. Epub 2013 Feb 7.
Bioinformatics is challenged by the fact that traditional analysis tools have difficulty in processing large-scale data from high-throughput sequencing. The open source Apache Hadoop project, which adopts the MapReduce framework and a distributed file system, has recently given bioinformatics researchers an opportunity to achieve scalable, efficient and reliable computing performance on Linux clusters and on cloud computing services. In this article, we present MapReduce frame-based applications that can be employed in the next-generation sequencing and other biological domains. In addition, we discuss the challenges faced by this field as well as the future works on parallel computing in bioinformatics.
传统分析工具在处理来自高通量测序的大规模数据时存在困难,这给生物信息学带来了挑战。开源的Apache Hadoop项目采用MapReduce框架和分布式文件系统,最近为生物信息学研究人员提供了一个机会,使其能够在Linux集群和云计算服务上实现可扩展、高效且可靠的计算性能。在本文中,我们展示了基于MapReduce框架的应用程序,这些应用程序可用于下一代测序及其他生物领域。此外,我们还讨论了该领域面临的挑战以及生物信息学中并行计算的未来工作。