序列比对的分形MapReduce分解

Fractal MapReduce decomposition of sequence alignment.

作者信息

Almeida Jonas S, Grüneberg Alexander, Maass Wolfgang, Vinga Susana

机构信息

Div Informatics, Dept Pathology, University of Alabama at Birmingham, USA.

出版信息

Algorithms Mol Biol. 2012 May 2;7(1):12. doi: 10.1186/1748-7188-7-12.

DOI:10.1186/1748-7188-7-12

PMID:22551205

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3394223/

Abstract

BACKGROUND

The dramatic fall in the cost of genomic sequencing, and the increasing convenience of distributed cloud computing resources, positions the MapReduce coding pattern as a cornerstone of scalable bioinformatics algorithm development. In some cases an algorithm will find a natural distribution via use of map functions to process vectorized components, followed by a reduce of aggregate intermediate results. However, for some data analysis procedures such as sequence analysis, a more fundamental reformulation may be required.

RESULTS

In this report we describe a solution to sequence comparison that can be thoroughly decomposed into multiple rounds of map and reduce operations. The route taken makes use of iterated maps, a fractal analysis technique, that has been found to provide a "alignment-free" solution to sequence analysis and comparison. That is, a solution that does not require dynamic programming, relying on a numeric Chaos Game Representation (CGR) data structure. This claim is demonstrated in this report by calculating the length of the longest similar segment by inspecting only the USM coordinates of two analogous units: with no resort to dynamic programming.

CONCLUSIONS

The procedure described is an attempt at extreme decomposition and parallelization of sequence alignment in anticipation of a volume of genomic sequence data that cannot be met by current algorithmic frameworks. The solution found is delivered with a browser-based application (webApp), highlighting the browser's emergence as an environment for high performance distributed computing.

AVAILABILITY

Public distribution of accompanying software library with open source and version control at http://usm.github.com. Also available as a webApp through Google Chrome's WebStore http://chrome.google.com/webstore: search with "usm".

摘要

背景

基因组测序成本的急剧下降，以及分布式云计算资源日益增加的便利性，使得MapReduce编码模式成为可扩展生物信息学算法开发的基石。在某些情况下，算法将通过使用映射函数来处理向量化组件找到自然分布，随后对聚合中间结果进行归约。然而，对于一些数据分析过程，如序列分析，可能需要更根本的重新表述。

结果

在本报告中，我们描述了一种序列比较的解决方案，该方案可以彻底分解为多轮映射和归约操作。所采用的方法利用了迭代映射，这是一种分形分析技术，已被发现可为序列分析和比较提供“无比对”解决方案。也就是说，一种不需要动态规划的解决方案，依赖于数字混沌游戏表示（CGR）数据结构。本报告通过仅检查两个类似单元的USM坐标来计算最长相似片段的长度来证明这一说法：无需借助动态规划。

结论

所描述的过程是对序列比对进行极端分解和并行化的尝试，以应对当前算法框架无法处理的大量基因组序列数据。找到的解决方案通过基于浏览器的应用程序（webApp）提供，突出了浏览器作为高性能分布式计算环境的出现。

可用性

随附软件库通过http://usm.github.com进行开源和版本控制的公开发布。也可通过谷歌浏览器的网络商店http://chrome.google.com/webstore作为webApp获取：搜索“usm”。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5ce/3394223/815b24bae3f5/1748-7188-7-12-1.jpg

相似文献

Fractal MapReduce decomposition of sequence alignment.序列比对的分形MapReduce分解

Algorithms Mol Biol. 2012 May 2;7(1):12. doi: 10.1186/1748-7188-7-12.

Ultrafast and scalable cone-beam CT reconstruction using MapReduce in a cloud computing environment.使用云计算环境中的 MapReduce 进行超快速可扩展的锥形束 CT 重建。

Med Phys. 2011 Dec;38(12):6603-9. doi: 10.1118/1.3660200.

Universal sequence map (USM) of arbitrary discrete sequences.任意离散序列的通用序列映射（USM）

BMC Bioinformatics. 2002;3:6. doi: 10.1186/1471-2105-3-6. Epub 2002 Feb 5.

Computing distribution of scale independent motifs in biological sequences.计算生物序列中与尺度无关的基序的分布。

Algorithms Mol Biol. 2006 Oct 18;1:18. doi: 10.1186/1748-7188-1-18.

Pattern matching through Chaos Game Representation: bridging numerical and discrete data structures for biological sequence analysis.通过混沌游戏表示法进行模式匹配：为生物序列分析搭建数字与离散数据结构之间的桥梁。

Algorithms Mol Biol. 2012 May 2;7(1):10. doi: 10.1186/1748-7188-7-10.

Efficient Boolean implementation of universal sequence maps (bUSM).通用序列映射（bUSM）的高效布尔实现。

BMC Bioinformatics. 2002 Oct 21;3:28. doi: 10.1186/1471-2105-3-28.

Sequence analysis by iterated maps, a review.通过迭代映射进行序列分析，综述。

Brief Bioinform. 2014 May;15(3):369-75. doi: 10.1093/bib/bbt072. Epub 2013 Oct 25.

Cloud-BS: A MapReduce-based bisulfite sequencing aligner on cloud.Cloud-BS：一种基于MapReduce的云端亚硫酸氢盐测序比对器。

J Bioinform Comput Biol. 2018 Dec;16(6):1840028. doi: 10.1142/S0219720018400280. Epub 2018 Oct 30.

Long Read Alignment with Parallel MapReduce Cloud Platform.使用并行MapReduce云平台进行长读段比对

Biomed Res Int. 2015;2015:807407. doi: 10.1155/2015/807407. Epub 2015 Dec 29.

Biological sequences as pictures: a generic two dimensional solution for iterated maps.作为图像的生物序列：迭代映射的通用二维解决方案。

BMC Bioinformatics. 2009 Mar 31;10:100. doi: 10.1186/1471-2105-10-100.

引用本文的文献

A survey and evaluations of histogram-based statistics in alignment-free sequence comparison.基于直方图的无比对序列比较统计的调查与评估。

Brief Bioinform. 2019 Jul 19;20(4):1222-1237. doi: 10.1093/bib/bbx161.

Alignment-free sequence comparison: benefits, applications, and tools.无比对信息的序列比对：优势、应用和工具。

Genome Biol. 2017 Oct 3;18(1):186. doi: 10.1186/s13059-017-1319-7.

Mechanistic Parameterization of the Kinomic Signal in Peptide Arrays.肽阵列中激酶组信号的机制参数化

J Proteomics Bioinform. 2016 May;9(5):151-157. doi: 10.4172/jpb.1000401. Epub 2016 May 24.

OpenHealth Platform for Interactive Contextualization of Population Health Open Data.用于人群健康开放数据交互式情境化的开放健康平台。

AMIA Annu Symp Proc. 2015 Nov 5;2015:297-305. eCollection 2015.

Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends.MapReduce 编程框架在临床大数据分析中的应用：现状与未来趋势。

BioData Min. 2014 Oct 29;7:22. doi: 10.1186/1756-0381-7-22. eCollection 2014.

QMachine: commodity supercomputing in web browsers.QMachine：网页浏览器中的商用超级计算

BMC Bioinformatics. 2014 Jun 9;15:176. doi: 10.1186/1471-2105-15-176.

Enabling large-scale biomedical analysis in the cloud.在云端实现大规模生物医学分析。

Biomed Res Int. 2013;2013:185679. doi: 10.1155/2013/185679. Epub 2013 Oct 31.

Sequence analysis by iterated maps, a review.通过迭代映射进行序列分析，综述。

Brief Bioinform. 2014 May;15(3):369-75. doi: 10.1093/bib/bbt072. Epub 2013 Oct 25.

A self-updating road map of The Cancer Genome Atlas.癌症基因组图谱的自更新路线图。

Bioinformatics. 2013 May 15;29(10):1333-40. doi: 10.1093/bioinformatics/btt141. Epub 2013 Apr 17.

ImageJS: Personalized, participated, pervasive, and reproducible image bioinformatics in the web browser.ImageJS：网络浏览器中个性化、参与式、普及式且可重复的图像生物信息学。

J Pathol Inform. 2012;3:25. doi: 10.4103/2153-3539.98813. Epub 2012 Jul 20.

本文引用的文献

Comparative analysis of algorithms for next-generation sequencing read alignment.下一代测序读段比对算法的比较分析。

Bioinformatics. 2011 Oct 15;27(20):2790-6. doi: 10.1093/bioinformatics/btr477. Epub 2011 Aug 19.

Alignment-free comparison of genome sequences by a new numerical characterization.基于新的数值特征的无比对基因组序列比较。

J Theor Biol. 2011 Jul 21;281(1):107-12. doi: 10.1016/j.jtbi.2011.04.003. Epub 2011 Apr 28.

An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics.Hadoop/MapReduce/HBase 框架概述及其在生物信息学中的当前应用。

BMC Bioinformatics. 2010 Dec 21;11 Suppl 12(Suppl 12):S1. doi: 10.1186/1471-2105-11-S12-S1.

Alignment-free estimation of nucleotide diversity.无比对核苷酸多样性估计。

Bioinformatics. 2011 Feb 15;27(4):449-55. doi: 10.1093/bioinformatics/btq689. Epub 2010 Dec 14.

Alignment-free sequence comparison (II): theoretical power of comparison statistics.无比对序列比较（II）：比较统计量的理论功效

J Comput Biol. 2010 Nov;17(11):1467-90. doi: 10.1089/cmb.2010.0056. Epub 2010 Oct 25.

Computational solutions to large-scale data management and analysis.大规模数据管理和分析的计算解决方案。

Nat Rev Genet. 2010 Sep;11(9):647-57. doi: 10.1038/nrg2857.

The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.基因组分析工具包：一种用于分析下一代 DNA 测序数据的 MapReduce 框架。

Genome Res. 2010 Sep;20(9):1297-303. doi: 10.1101/gr.107524.110. Epub 2010 Jul 19.

Alignment-free sequence comparison (I): statistics and power.无比对序列比较（I）：统计学与效能

J Comput Biol. 2009 Dec;16(12):1615-34. doi: 10.1089/cmb.2009.0198.

CloudBurst: highly sensitive read mapping with MapReduce.CloudBurst：使用MapReduce进行高灵敏度读段比对

Bioinformatics. 2009 Jun 1;25(11):1363-9. doi: 10.1093/bioinformatics/btp236. Epub 2009 Apr 8.

Biological sequences as pictures: a generic two dimensional solution for iterated maps.作为图像的生物序列：迭代映射的通用二维解决方案。

BMC Bioinformatics. 2009 Mar 31;10:100. doi: 10.1186/1471-2105-10-100.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

序列比对的分形MapReduce分解

Fractal MapReduce decomposition of sequence alignment.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

AVAILABILITY

背景

结果

结论

可用性

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献