Suppr超能文献

基于信息和自动化的基因组组装的 k-mer 大小选择。

Informed and automated k-mer size selection for genome assembly.

机构信息

Department of Computer Science and Engineering and Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA.

出版信息

Bioinformatics. 2014 Jan 1;30(1):31-7. doi: 10.1093/bioinformatics/btt310. Epub 2013 Jun 3.

Abstract

MOTIVATION

Genome assembly tools based on the de Bruijn graph framework rely on a parameter k, which represents a trade-off between several competing effects that are difficult to quantify. There is currently a lack of tools that would automatically estimate the best k to use and/or quickly generate histograms of k-mer abundances that would allow the user to make an informed decision.

RESULTS

We develop a fast and accurate sampling method that constructs approximate abundance histograms with several orders of magnitude performance improvement over traditional methods. We then present a fast heuristic that uses the generated abundance histograms for putative k values to estimate the best possible value of k. We test the effectiveness of our tool using diverse sequencing datasets and find that its choice of k leads to some of the best assemblies.

AVAILABILITY

Our tool KmerGenie is freely available at: http://kmergenie.bx.psu.edu/.

摘要

动机

基于 de Bruijn 图框架的基因组组装工具依赖于一个参数 k,它代表了几种难以量化的竞争效应之间的权衡。目前缺乏能够自动估计最佳 k 值并/或快速生成 k-mer 丰度直方图的工具,从而使用户能够做出明智的决策。

结果

我们开发了一种快速而准确的抽样方法,该方法构建了近似丰度直方图,与传统方法相比,性能提高了几个数量级。然后,我们提出了一种快速启发式算法,该算法使用生成的丰度直方图来估计最佳的 k 值。我们使用各种测序数据集来测试我们工具的有效性,发现它选择的 k 值可以得到一些最好的组装结果。

可用性

我们的工具 KmerGenie 可免费在:http://kmergenie.bx.psu.edu/ 获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验