Suppr超能文献

基于覆盖模板家族的短读段容错索引与比对

Error tolerant indexing and alignment of short reads with covering template families.

作者信息

Giladi Eldar, Healy John, Myers Gene, Hart Chris, Kapranov Philipp, Lipson Doron, Roels Steve, Thayer Edward, Letovsky Stan

机构信息

Helicos BioSciences Corporation, Cambridge, Massachusetts 02139, USA.

出版信息

J Comput Biol. 2010 Oct;17(10):1397-1411. doi: 10.1089/cmb.2010.0005.

Abstract

The rapid adoption of high-throughput next generation sequence data in biological research is presenting a major challenge for sequence alignment tools—specifically, the efficient alignment of vast amounts of short reads to large references in the presence of differences arising from sequencing errors and biological sequence variations. To address this challenge, we developed a short read aligner for high-throughput sequencer data that is tolerant of errors or mutations of all types—namely, substitutions, deletions, and insertions. The aligner utilizes a multi-stage approach in which template-based indexing is used to identify candidate regions for alignment with dynamic programming. A template is a pair of gapped seeds, with one used with the read and one used with the reference. In this article, we focus on the development of template families that yield error-tolerant indexing up to a given error-budget. A general algorithm for finding those families is presented, and a recursive construction that creates families with higher error tolerance from ones with a lower error tolerance is developed.

摘要

生物研究中高通量下一代序列数据的迅速采用,给序列比对工具带来了重大挑战——具体而言,就是在存在测序错误和生物序列变异所导致差异的情况下,将大量短读段高效比对到大型参考序列上。为应对这一挑战,我们开发了一种用于高通量测序仪数据的短读段比对器,它能够容忍所有类型的错误或突变——即替换、缺失和插入。该比对器采用多阶段方法,其中基于模板的索引用于通过动态规划识别比对的候选区域。一个模板是一对带间隙的种子,一个与读段一起使用,另一个与参考序列一起使用。在本文中,我们专注于模板家族的开发,这些模板家族在给定的错误预算内产生容错索引。提出了一种寻找这些家族的通用算法,并开发了一种递归构造方法,该方法从具有较低容错能力的家族创建具有较高容错能力的家族。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验