• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于信息和自动化的基因组组装的 k-mer 大小选择。

Informed and automated k-mer size selection for genome assembly.

机构信息

Department of Computer Science and Engineering and Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA.

出版信息

Bioinformatics. 2014 Jan 1;30(1):31-7. doi: 10.1093/bioinformatics/btt310. Epub 2013 Jun 3.

DOI:10.1093/bioinformatics/btt310
PMID:23732276
Abstract

MOTIVATION

Genome assembly tools based on the de Bruijn graph framework rely on a parameter k, which represents a trade-off between several competing effects that are difficult to quantify. There is currently a lack of tools that would automatically estimate the best k to use and/or quickly generate histograms of k-mer abundances that would allow the user to make an informed decision.

RESULTS

We develop a fast and accurate sampling method that constructs approximate abundance histograms with several orders of magnitude performance improvement over traditional methods. We then present a fast heuristic that uses the generated abundance histograms for putative k values to estimate the best possible value of k. We test the effectiveness of our tool using diverse sequencing datasets and find that its choice of k leads to some of the best assemblies.

AVAILABILITY

Our tool KmerGenie is freely available at: http://kmergenie.bx.psu.edu/.

摘要

动机

基于 de Bruijn 图框架的基因组组装工具依赖于一个参数 k,它代表了几种难以量化的竞争效应之间的权衡。目前缺乏能够自动估计最佳 k 值并/或快速生成 k-mer 丰度直方图的工具,从而使用户能够做出明智的决策。

结果

我们开发了一种快速而准确的抽样方法,该方法构建了近似丰度直方图,与传统方法相比,性能提高了几个数量级。然后,我们提出了一种快速启发式算法,该算法使用生成的丰度直方图来估计最佳的 k 值。我们使用各种测序数据集来测试我们工具的有效性,发现它选择的 k 值可以得到一些最好的组装结果。

可用性

我们的工具 KmerGenie 可免费在:http://kmergenie.bx.psu.edu/ 获取。

相似文献

1
Informed and automated k-mer size selection for genome assembly.基于信息和自动化的基因组组装的 k-mer 大小选择。
Bioinformatics. 2014 Jan 1;30(1):31-7. doi: 10.1093/bioinformatics/btt310. Epub 2013 Jun 3.
2
RecoverY: k-mer-based read classification for Y-chromosome-specific sequencing and assembly.RecoverY:基于 k-mer 的读分类方法,用于 Y 染色体特异性测序和组装。
Bioinformatics. 2018 Apr 1;34(7):1125-1131. doi: 10.1093/bioinformatics/btx771.
3
Squeakr: an exact and approximate k-mer counting system.Squeakr:一种精确和近似的 k-mer 计数系统。
Bioinformatics. 2018 Feb 15;34(4):568-575. doi: 10.1093/bioinformatics/btx636.
4
HyDA-Vista: towards optimal guided selection of k-mer size for sequence assembly.HyDA-Vista:迈向序列组装中k-mer大小的最优引导选择
BMC Genomics. 2014;15 Suppl 10(Suppl 10):S9. doi: 10.1186/1471-2164-15-S10-S9. Epub 2014 Dec 12.
5
A space and time-efficient index for the compacted colored de Bruijn graph.一种用于压缩彩色 de Bruijn 图的空间和时间高效索引。
Bioinformatics. 2018 Jul 1;34(13):i169-i177. doi: 10.1093/bioinformatics/bty292.
6
Athena: Automated Tuning of k-mer based Genomic Error Correction Algorithms using Language Models.雅典娜:使用语言模型自动调整基于 k-mer 的基因组纠错算法。
Sci Rep. 2019 Nov 6;9(1):16157. doi: 10.1038/s41598-019-52196-4.
7
Multiplex de Bruijn graphs enable genome assembly from long, high-fidelity reads.多重 de Bruijn 图可从长的、高保真的读取中进行基因组组装。
Nat Biotechnol. 2022 Jul;40(7):1075-1081. doi: 10.1038/s41587-022-01220-6. Epub 2022 Feb 28.
8
Scalable Genome Assembly through Parallel de Bruijn Graph Construction for Multiple k-mers.通过并行构建多个 k-mer 的 de Bruijn 图实现可扩展的基因组组装。
Sci Rep. 2019 Oct 16;9(1):14882. doi: 10.1038/s41598-019-51284-9.
9
KmerStream: streaming algorithms for k-mer abundance estimation.KmerStream:用于 k-mer 丰度估计的流算法。
Bioinformatics. 2014 Dec 15;30(24):3541-7. doi: 10.1093/bioinformatics/btu713. Epub 2014 Oct 28.
10
Toward perfect reads: self-correction of short reads via mapping on de Bruijn graphs.迈向完美读段:通过在 De Bruijn 图上进行映射来自我纠正短读段。
Bioinformatics. 2020 Mar 1;36(5):1374-1381. doi: 10.1093/bioinformatics/btz102.

引用本文的文献

1
Genomic analysis highlights the conservation significance of Torenia concolor (Linderniaceae) from the periphery of its distribution range.基因组分析凸显了彩叶草(母草科)在其分布范围边缘的保护意义。
J Plant Res. 2025 Aug 19. doi: 10.1007/s10265-025-01659-z.
2
Chromosome-level genome assembly of Marco Polo Blister Beetle (Hycleus marcipoli).黑角黑丝芫菁(Hycleus marcipoli)的染色体水平基因组组装
Sci Data. 2025 Aug 9;12(1):1396. doi: 10.1038/s41597-025-05728-9.
3
Comparative Genomics and Draft Genome Assembly of the Elite Tunisian Date Palm Cultivar Deglet Nour: Insights into the Genetic Variations Linked to Fruit Ripening and Quality Traits.
突尼斯优质椰枣品种‘Deglet Nour’的比较基因组学与基因组草图组装:对与果实成熟和品质性状相关的遗传变异的见解
Int J Mol Sci. 2025 Jul 16;26(14):6844. doi: 10.3390/ijms26146844.
4
Ulmus minor response to Dutch elm disease: de novo transcriptome assembly and annotation.小叶榆对荷兰榆树病的响应:从头转录组组装与注释
Sci Data. 2025 Jul 23;12(1):1282. doi: 10.1038/s41597-025-05539-y.
5
Genomic Analysis and Metabolite Profiling of Three Probiotic Strains for Potential Application in Aquaculture.三种益生菌菌株的基因组分析及代谢物谱分析在水产养殖中的潜在应用
Prev Nutr Food Sci. 2025 Jun 30;30(3):274-284. doi: 10.3746/pnf.2025.30.3.274.
6
Chromosome-scale genome assembly of B10.RIII, an autoimmune susceptible mouse strain.自身免疫易感小鼠品系B10.RIII的染色体水平基因组组装
bioRxiv. 2025 May 21:2025.05.16.654505. doi: 10.1101/2025.05.16.654505.
7
Efficient De Novo Assembly and Recovery of Microbial Genomes from Complex Metagenomes Using a Reduced Set of k-mers.利用精简的k-mer集从复杂宏基因组中高效地从头组装和恢复微生物基因组
Interdiscip Sci. 2025 Jun 2. doi: 10.1007/s12539-025-00722-6.
8
Chromosome-level draft genome assembly of reveals transposable element expansion reshaping the genome structure.[物种名称]的染色体水平基因组草图组装揭示了转座元件扩张重塑基因组结构。 (这里原文缺少具体物种名称,翻译时补充为[物种名称]使句子完整表意)
Front Genet. 2025 Apr 29;16:1502681. doi: 10.3389/fgene.2025.1502681. eCollection 2025.
9
Mapping-based genome size estimation.基于图谱的基因组大小估计
BMC Genomics. 2025 May 14;26(1):482. doi: 10.1186/s12864-025-11640-8.
10
The haplotype-resolved T2T genome for Bauhinia × blakeana sheds light on the genetic basis of flower heterosis.洋紫荆的单倍型解析T2T基因组揭示了花杂种优势的遗传基础。
Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf044.