Suppr超能文献

最优种子求解器:优化读段映射中的种子选择

Optimal seed solver: optimizing seed selection in read mapping.

作者信息

Xin Hongyi, Nahar Sunny, Zhu Richard, Emmons John, Pekhimenko Gennady, Kingsford Carl, Alkan Can, Mutlu Onur

机构信息

Computer Science Department.

Department of Computer Science and Engineering, Washington University, St. Louis, MO 63130, USA.

出版信息

Bioinformatics. 2016 Jun 1;32(11):1632-42. doi: 10.1093/bioinformatics/btv670. Epub 2015 Nov 14.

Abstract

MOTIVATION

Optimizing seed selection is an important problem in read mapping. The number of non-overlapping seeds a mapper selects determines the sensitivity of the mapper while the total frequency of all selected seeds determines the speed of the mapper. Modern seed-and-extend mappers usually select seeds with either an equal and fixed-length scheme or with an inflexible placement scheme, both of which limit the ability of the mapper in selecting less frequent seeds to speed up the mapping process. Therefore, it is crucial to develop a new algorithm that can adjust both the individual seed length and the seed placement, as well as derive less frequent seeds.

RESULTS

We present the Optimal Seed Solver (OSS), a dynamic programming algorithm that discovers the least frequently-occurring set of x seeds in an L-base-pair read in [Formula: see text] operations on average and in [Formula: see text] operations in the worst case, while generating a maximum of [Formula: see text] seed frequency database lookups. We compare OSS against four state-of-the-art seed selection schemes and observe that OSS provides a 3-fold reduction in average seed frequency over the best previous seed selection optimizations.

AVAILABILITY AND IMPLEMENTATION

We provide an implementation of the Optimal Seed Solver in C++ at: https://github.com/CMU-SAFARI/Optimal-Seed-Solver

CONTACT

hxin@cmu.edu, calkan@cs.bilkent.edu.tr or onur@cmu.edu

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

优化种子选择是读段映射中的一个重要问题。映射器选择的非重叠种子数量决定了映射器的灵敏度,而所有选定种子的总频率决定了映射器的速度。现代的种子扩展映射器通常采用等长固定方案或固定放置方案来选择种子,这两种方案都限制了映射器选择低频种子以加快映射过程的能力。因此,开发一种能够调整单个种子长度和种子放置,并能导出低频种子的新算法至关重要。

结果

我们提出了最优种子求解器(OSS),这是一种动态规划算法,平均在[公式:见正文]次操作中,最坏情况下在[公式:见正文]次操作中,能在一个L碱基对读段中发现x个种子的出现频率最低的集合,同时最多产生[公式:见正文]次种子频率数据库查找。我们将OSS与四种最先进的种子选择方案进行比较,发现OSS相比之前最佳的种子选择优化,平均种子频率降低了3倍。

可用性与实现

我们在https://github.com/CMU-SAFARI/Optimal-Seed-Solver上提供了用C++实现的最优种子求解器。

联系方式

hxin@cmu.educalkan@cs.bilkent.edu.tronur@cmu.edu

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

1
Optimal seed solver: optimizing seed selection in read mapping.最优种子求解器:优化读段映射中的种子选择
Bioinformatics. 2016 Jun 1;32(11):1632-42. doi: 10.1093/bioinformatics/btv670. Epub 2015 Nov 14.
2
Context-aware seeds for read mapping.用于读取映射的上下文感知种子
Algorithms Mol Biol. 2020 May 23;15:10. doi: 10.1186/s13015-020-00172-3. eCollection 2020.
5
ALeS: adaptive-length spaced-seed design.ALeS:自适应长度间隔种子设计。
Bioinformatics. 2021 Jun 9;37(9):1206-1210. doi: 10.1093/bioinformatics/btaa945.
8
WALT: fast and accurate read mapping for bisulfite sequencing.目标:用于亚硫酸氢盐测序的快速准确的读段比对
Bioinformatics. 2016 Nov 15;32(22):3507-3509. doi: 10.1093/bioinformatics/btw490. Epub 2016 Jul 27.
10
Canonical, stable, general mapping using context schemes.使用上下文方案进行规范、稳定的通用映射。
Bioinformatics. 2015 Nov 15;31(22):3569-76. doi: 10.1093/bioinformatics/btv435. Epub 2015 Jul 27.

引用本文的文献

7
Context-aware seeds for read mapping.用于读取映射的上下文感知种子
Algorithms Mol Biol. 2020 May 23;15:10. doi: 10.1186/s13015-020-00172-3. eCollection 2020.

本文引用的文献

1
Multiple seeds sensitivity using a single seed with threshold.使用具有阈值的单个种子点的多种子点敏感性
J Bioinform Comput Biol. 2015 Aug;13(4):1550011. doi: 10.1142/S0219720015500110. Epub 2015 Feb 3.
4
Great ape genetic diversity and population history.巨猿的遗传多样性和种群历史。
Nature. 2013 Jul 25;499(7459):471-5. doi: 10.1038/nature12228. Epub 2013 Jul 3.
5
Accelerating read mapping with FastHASH.使用 FastHASH 加速读映射。
BMC Genomics. 2013;14 Suppl 1(Suppl 1):S13. doi: 10.1186/1471-2164-14-S1-S13. Epub 2013 Jan 21.
8
A high-coverage genome sequence from an archaic Denisovan individual.古丹尼索瓦人个体的高覆盖度基因组序列。
Science. 2012 Oct 12;338(6104):222-6. doi: 10.1126/science.1224344. Epub 2012 Aug 30.
9
RazerS 3: faster, fully sensitive read mapping.RazerS 3:更快、全敏读映射。
Bioinformatics. 2012 Oct 15;28(20):2592-9. doi: 10.1093/bioinformatics/bts505. Epub 2012 Aug 24.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验