Xin Hongyi, Nahar Sunny, Zhu Richard, Emmons John, Pekhimenko Gennady, Kingsford Carl, Alkan Can, Mutlu Onur
Computer Science Department.
Department of Computer Science and Engineering, Washington University, St. Louis, MO 63130, USA.
Bioinformatics. 2016 Jun 1;32(11):1632-42. doi: 10.1093/bioinformatics/btv670. Epub 2015 Nov 14.
Optimizing seed selection is an important problem in read mapping. The number of non-overlapping seeds a mapper selects determines the sensitivity of the mapper while the total frequency of all selected seeds determines the speed of the mapper. Modern seed-and-extend mappers usually select seeds with either an equal and fixed-length scheme or with an inflexible placement scheme, both of which limit the ability of the mapper in selecting less frequent seeds to speed up the mapping process. Therefore, it is crucial to develop a new algorithm that can adjust both the individual seed length and the seed placement, as well as derive less frequent seeds.
We present the Optimal Seed Solver (OSS), a dynamic programming algorithm that discovers the least frequently-occurring set of x seeds in an L-base-pair read in [Formula: see text] operations on average and in [Formula: see text] operations in the worst case, while generating a maximum of [Formula: see text] seed frequency database lookups. We compare OSS against four state-of-the-art seed selection schemes and observe that OSS provides a 3-fold reduction in average seed frequency over the best previous seed selection optimizations.
We provide an implementation of the Optimal Seed Solver in C++ at: https://github.com/CMU-SAFARI/Optimal-Seed-Solver
hxin@cmu.edu, calkan@cs.bilkent.edu.tr or onur@cmu.edu
Supplementary data are available at Bioinformatics online.
优化种子选择是读段映射中的一个重要问题。映射器选择的非重叠种子数量决定了映射器的灵敏度,而所有选定种子的总频率决定了映射器的速度。现代的种子扩展映射器通常采用等长固定方案或固定放置方案来选择种子,这两种方案都限制了映射器选择低频种子以加快映射过程的能力。因此,开发一种能够调整单个种子长度和种子放置,并能导出低频种子的新算法至关重要。
我们提出了最优种子求解器(OSS),这是一种动态规划算法,平均在[公式:见正文]次操作中,最坏情况下在[公式:见正文]次操作中,能在一个L碱基对读段中发现x个种子的出现频率最低的集合,同时最多产生[公式:见正文]次种子频率数据库查找。我们将OSS与四种最先进的种子选择方案进行比较,发现OSS相比之前最佳的种子选择优化,平均种子频率降低了3倍。
我们在https://github.com/CMU-SAFARI/Optimal-Seed-Solver上提供了用C++实现的最优种子求解器。
hxin@cmu.edu,calkan@cs.bilkent.edu.tr或onur@cmu.edu
补充数据可在《生物信息学》在线获取。