• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于覆盖模板家族的短读段容错索引与比对

Error tolerant indexing and alignment of short reads with covering template families.

作者信息

Giladi Eldar, Healy John, Myers Gene, Hart Chris, Kapranov Philipp, Lipson Doron, Roels Steve, Thayer Edward, Letovsky Stan

机构信息

Helicos BioSciences Corporation, Cambridge, Massachusetts 02139, USA.

出版信息

J Comput Biol. 2010 Oct;17(10):1397-1411. doi: 10.1089/cmb.2010.0005.

DOI:10.1089/cmb.2010.0005
PMID:20937014
Abstract

The rapid adoption of high-throughput next generation sequence data in biological research is presenting a major challenge for sequence alignment tools—specifically, the efficient alignment of vast amounts of short reads to large references in the presence of differences arising from sequencing errors and biological sequence variations. To address this challenge, we developed a short read aligner for high-throughput sequencer data that is tolerant of errors or mutations of all types—namely, substitutions, deletions, and insertions. The aligner utilizes a multi-stage approach in which template-based indexing is used to identify candidate regions for alignment with dynamic programming. A template is a pair of gapped seeds, with one used with the read and one used with the reference. In this article, we focus on the development of template families that yield error-tolerant indexing up to a given error-budget. A general algorithm for finding those families is presented, and a recursive construction that creates families with higher error tolerance from ones with a lower error tolerance is developed.

摘要

生物研究中高通量下一代序列数据的迅速采用,给序列比对工具带来了重大挑战——具体而言,就是在存在测序错误和生物序列变异所导致差异的情况下,将大量短读段高效比对到大型参考序列上。为应对这一挑战,我们开发了一种用于高通量测序仪数据的短读段比对器,它能够容忍所有类型的错误或突变——即替换、缺失和插入。该比对器采用多阶段方法,其中基于模板的索引用于通过动态规划识别比对的候选区域。一个模板是一对带间隙的种子,一个与读段一起使用,另一个与参考序列一起使用。在本文中,我们专注于模板家族的开发,这些模板家族在给定的错误预算内产生容错索引。提出了一种寻找这些家族的通用算法,并开发了一种递归构造方法,该方法从具有较低容错能力的家族创建具有较高容错能力的家族。

相似文献

1
Error tolerant indexing and alignment of short reads with covering template families.基于覆盖模板家族的短读段容错索引与比对
J Comput Biol. 2010 Oct;17(10):1397-1411. doi: 10.1089/cmb.2010.0005.
2
Correction of sequencing errors in a mixed set of reads.纠正混合读取集中的测序错误。
Bioinformatics. 2010 May 15;26(10):1284-90. doi: 10.1093/bioinformatics/btq151. Epub 2010 Apr 8.
3
Optimal spliced alignments of short sequence reads.短序列 reads 的最优剪接比对。
Bioinformatics. 2008 Aug 15;24(16):i174-80. doi: 10.1093/bioinformatics/btn300.
4
EDAR: an efficient error detection and removal algorithm for next generation sequencing data.EDAR:一种用于下一代测序数据的高效错误检测与去除算法。
J Comput Biol. 2010 Nov;17(11):1549-60. doi: 10.1089/cmb.2010.0127. Epub 2010 Oct 25.
5
Illuminator, a desktop program for mutation detection using short-read clonal sequencing.Illuminator,一款用于短读长克隆测序突变检测的桌面程序。
Genomics. 2011 Oct;98(4):302-9. doi: 10.1016/j.ygeno.2011.05.004. Epub 2011 May 19.
6
Analysis of high-throughput sequencing data.高通量测序数据的分析
Methods Mol Biol. 2011;678:1-11. doi: 10.1007/978-1-60761-682-5_1.
7
Microindel detection in short-read sequence data.短读序列数据中的微缺失/插入检测。
Bioinformatics. 2010 Mar 15;26(6):722-9. doi: 10.1093/bioinformatics/btq027. Epub 2010 Feb 9.
8
De novo sequencing of plant genomes using second-generation technologies.利用第二代技术对植物基因组进行从头测序。
Brief Bioinform. 2009 Nov;10(6):609-18. doi: 10.1093/bib/bbp039.
9
Reptile: representative tiling for short read error correction.爬行动物:简称短读错误纠正的代表性平铺。
Bioinformatics. 2010 Oct 15;26(20):2526-33. doi: 10.1093/bioinformatics/btq468. Epub 2010 Aug 16.
10
Artificial duplicate reads in sequencing data of 454 Genome Sequencer FLX System.454 基因组测序仪测序数据中的人工重复读。
Acta Biochim Biophys Sin (Shanghai). 2011 Jun;43(6):496-500. doi: 10.1093/abbs/gmr030. Epub 2011 May 4.

引用本文的文献

1
A survey of mapping algorithms in the long-reads era.长读时代的图谱算法研究综述。
Genome Biol. 2023 Jun 1;24(1):133. doi: 10.1186/s13059-023-02972-3.
2
Entropy predicts sensitivity of pseudorandom seeds.熵预测伪随机种子的敏感性。
Genome Res. 2023 Jul;33(7):1162-1174. doi: 10.1101/gr.277645.123. Epub 2023 May 22.
3
Effective sequence similarity detection with strobemers.利用频闪体进行有效的序列相似性检测。
Genome Res. 2021 Nov;31(11):2080-2094. doi: 10.1101/gr.275648.121. Epub 2021 Oct 19.
4
RNA sequencing of blood in coronary artery disease: involvement of regulatory T cell imbalance.冠心病患者血液中的 RNA 测序:调节性 T 细胞失衡的参与。
BMC Med Genomics. 2021 Sep 3;14(1):216. doi: 10.1186/s12920-021-01062-2.
5
Very long intergenic non-coding (vlinc) RNAs directly regulate multiple genes in cis and trans.非常长的基因间非编码(vlinc)RNA 可直接在顺式和反式中调控多个基因。
BMC Biol. 2021 May 20;19(1):108. doi: 10.1186/s12915-021-01044-x.
6
Novel approach reveals genomic landscapes of single-strand DNA breaks with nucleotide resolution in human cells.新方法揭示了人类细胞中单链 DNA 断裂的基因组景观,具有核苷酸分辨率。
Nat Commun. 2019 Dec 20;10(1):5799. doi: 10.1038/s41467-019-13602-7.
7
Diversification of Retinoblastoma Protein Function Associated with Cis and Trans Adaptations.视网膜母细胞瘤蛋白功能的多样化与顺式和反式适应有关。
Mol Biol Evol. 2019 Dec 1;36(12):2790-2804. doi: 10.1093/molbev/msz187.
8
Identification of novel GLI1 target genes and regulatory circuits in human cancer cells.鉴定人癌细胞中新型 GLI1 靶基因和调控回路。
Mol Oncol. 2018 Oct;12(10):1718-1734. doi: 10.1002/1878-0261.12366. Epub 2018 Aug 30.
9
Best hits of 11110110111: model-free selection and parameter-free sensitivity calculation of spaced seeds.11110110111的最佳命中结果:间隔种子的无模型选择和无参数敏感性计算
Algorithms Mol Biol. 2017 Feb 14;12:1. doi: 10.1186/s13015-017-0092-1. eCollection 2017.
10
A coverage criterion for spaced seeds and its applications to support vector machine string kernels and k-mer distances.间隔种子的覆盖标准及其在支持向量机字符串核和k-mer距离中的应用。
J Comput Biol. 2014 Dec;21(12):947-63. doi: 10.1089/cmb.2014.0173.