• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

SAKE:频闪辅助 k-mer 提取。

SAKE: Strobemer-assisted k-mer extraction.

机构信息

Department of Computer Science, University of Helsinki, Helsinki, Finland.

出版信息

PLoS One. 2023 Nov 29;18(11):e0294415. doi: 10.1371/journal.pone.0294415. eCollection 2023.

DOI:10.1371/journal.pone.0294415
PMID:38019768
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10686461/
Abstract

K-mer-based analysis plays an important role in many bioinformatics applications, such as de novo assembly, sequencing error correction, and genotyping. To take full advantage of such methods, the k-mer content of a read set must be captured as accurately as possible. Often the use of long k-mers is preferred because they can be uniquely associated with a specific genomic region. Unfortunately, it is not possible to reliably extract long k-mers in high error rate reads with standard exact k-mer counting methods. We propose SAKE, a method to extract long k-mers from high error rate reads by utilizing strobemers and consensus k-mer generation through partial order alignment. Our experiments show that on simulated data with up to 6% error rate, SAKE can extract 97-mers with over 90% recall. Conversely, the recall of DSK, an exact k-mer counter, drops to less than 20%. Furthermore, the precision of SAKE remains similar to DSK. On real bacterial data, SAKE retrieves 97-mers with a recall of over 90% and slightly lower precision than DSK, while the recall of DSK already drops to 50%. We show that SAKE can extract more k-mers from uncorrected high error rate reads compared to exact k-mer counting. However, exact k-mer counters run on corrected reads can extract slightly more k-mers than SAKE run on uncorrected reads.

摘要

基于 K -mer 的分析在许多生物信息学应用中起着重要作用,例如从头组装、测序错误校正和基因分型。为了充分利用这些方法,必须尽可能准确地捕获读取集的 K-mer 含量。通常更喜欢使用长 K-mer,因为它们可以与特定的基因组区域唯一相关。不幸的是,使用标准的精确 K-mer 计数方法无法可靠地从高错误率的读取中提取长 K-mer。我们提出了 SAKE,这是一种通过使用频闪器和通过部分有序对齐生成共识 K-mer 来从高错误率读取中提取长 K-mer 的方法。我们的实验表明,在高达 6%错误率的模拟数据上,SAKE 可以提取 97-mer,召回率超过 90%。相反,精确 K-mer 计数器 DSK 的召回率降至 20%以下。此外,SAKE 的精度与 DSK 相似。在真实的细菌数据上,SAKE 检索到 97-mer,召回率超过 90%,精度略低于 DSK,而 DSK 的召回率已经降至 50%。我们表明,与精确的 K-mer 计数相比,SAKE 可以从未经校正的高错误率读取中提取更多的 K-mer。然而,在未校正的读取上运行的精确 K-mer 计数器可以提取比在未校正的读取上运行的 SAKE 略多的 K-mer。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/671c/10686461/7bdd02eef2d0/pone.0294415.g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/671c/10686461/914e420c4a34/pone.0294415.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/671c/10686461/018ce00b1945/pone.0294415.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/671c/10686461/9fb4ff1a3068/pone.0294415.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/671c/10686461/0a4f6a75cb0b/pone.0294415.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/671c/10686461/2199c36d83d3/pone.0294415.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/671c/10686461/3cf0a96e6d42/pone.0294415.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/671c/10686461/e670cff091d4/pone.0294415.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/671c/10686461/bd68e4b82ee7/pone.0294415.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/671c/10686461/6f710a3e0783/pone.0294415.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/671c/10686461/11bc9d7751f7/pone.0294415.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/671c/10686461/4c81c0f74be0/pone.0294415.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/671c/10686461/19dfad6a8a7f/pone.0294415.g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/671c/10686461/80da02c49223/pone.0294415.g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/671c/10686461/7bdd02eef2d0/pone.0294415.g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/671c/10686461/914e420c4a34/pone.0294415.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/671c/10686461/018ce00b1945/pone.0294415.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/671c/10686461/9fb4ff1a3068/pone.0294415.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/671c/10686461/0a4f6a75cb0b/pone.0294415.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/671c/10686461/2199c36d83d3/pone.0294415.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/671c/10686461/3cf0a96e6d42/pone.0294415.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/671c/10686461/e670cff091d4/pone.0294415.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/671c/10686461/bd68e4b82ee7/pone.0294415.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/671c/10686461/6f710a3e0783/pone.0294415.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/671c/10686461/11bc9d7751f7/pone.0294415.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/671c/10686461/4c81c0f74be0/pone.0294415.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/671c/10686461/19dfad6a8a7f/pone.0294415.g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/671c/10686461/80da02c49223/pone.0294415.g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/671c/10686461/7bdd02eef2d0/pone.0294415.g014.jpg

相似文献

1
SAKE: Strobemer-assisted k-mer extraction.SAKE:频闪辅助 k-mer 提取。
PLoS One. 2023 Nov 29;18(11):e0294415. doi: 10.1371/journal.pone.0294415. eCollection 2023.
2
DSK: k-mer counting with very low memory usage.DSK:使用极低内存进行 k-mer 计数。
Bioinformatics. 2013 Mar 1;29(5):652-3. doi: 10.1093/bioinformatics/btt020. Epub 2013 Jan 16.
3
Extraction of long k-mers using spaced seeds.使用间隔种子提取长k-mer
IEEE/ACM Trans Comput Biol Bioinform. 2021 Sep 16;PP. doi: 10.1109/TCBB.2021.3113131.
4
Effective sequence similarity detection with strobemers.利用频闪体进行有效的序列相似性检测。
Genome Res. 2021 Nov;31(11):2080-2094. doi: 10.1101/gr.275648.121. Epub 2021 Oct 19.
5
These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure.这些不是你要找的k-mer:使用概率数据结构进行高效在线k-mer计数。
PLoS One. 2014 Jul 25;9(7):e101271. doi: 10.1371/journal.pone.0101271. eCollection 2014.
6
Lerna: transformer architectures for configuring error correction tools for short- and long-read genome sequencing.Lerna:用于配置短读和长读基因组测序错误纠正工具的变压器架构。
BMC Bioinformatics. 2022 Jan 6;23(1):25. doi: 10.1186/s12859-021-04547-0.
7
Squeakr: an exact and approximate k-mer counting system.Squeakr:一种精确和近似的 k-mer 计数系统。
Bioinformatics. 2018 Feb 15;34(4):568-575. doi: 10.1093/bioinformatics/btx636.
8
A general near-exact k-mer counting method with low memory consumption enables de novo assembly of 106× human sequence data in 2.7 hours.一种通用的、近精确的低内存消耗 k-mer 计数方法,可在 2.7 小时内完成 106×人类序列数据的从头组装。
Bioinformatics. 2020 Dec 30;36(Suppl_2):i625-i633. doi: 10.1093/bioinformatics/btaa890.
9
QuorUM: An Error Corrector for Illumina Reads.QuorUM:Illumina测序读数的纠错工具
PLoS One. 2015 Jun 17;10(6):e0130821. doi: 10.1371/journal.pone.0130821. eCollection 2015.
10
A hybrid and scalable error correction algorithm for indel and substitution errors of long reads.一种用于长读段插入/缺失和替换错误的混合可扩展纠错算法。
BMC Genomics. 2019 Dec 20;20(Suppl 11):948. doi: 10.1186/s12864-019-6286-9.

本文引用的文献

1
Multiplex de Bruijn graphs enable genome assembly from long, high-fidelity reads.多重 de Bruijn 图可从长的、高保真的读取中进行基因组组装。
Nat Biotechnol. 2022 Jul;40(7):1075-1081. doi: 10.1038/s41587-022-01220-6. Epub 2022 Feb 28.
2
Effective sequence similarity detection with strobemers.利用频闪体进行有效的序列相似性检测。
Genome Res. 2021 Nov;31(11):2080-2094. doi: 10.1101/gr.275648.121. Epub 2021 Oct 19.
3
New strategies to improve minimap2 alignment accuracy.提高 minimap2 比对准确性的新策略。
Bioinformatics. 2021 Dec 7;37(23):4572-4574. doi: 10.1093/bioinformatics/btab705.
4
Extraction of long k-mers using spaced seeds.使用间隔种子提取长k-mer
IEEE/ACM Trans Comput Biol Bioinform. 2021 Sep 16;PP. doi: 10.1109/TCBB.2021.3113131.
5
Twelve years of SAMtools and BCFtools.SAMtools 和 BCFtools 十二年。
Gigascience. 2021 Feb 16;10(2). doi: 10.1093/gigascience/giab008.
6
MBG: Minimizer-based sparse de Bruijn Graph construction.MBG:基于最小化器的稀疏德布鲁因图构建。
Bioinformatics. 2021 Aug 25;37(16):2476-2478. doi: 10.1093/bioinformatics/btab004.
7
Scalable long read self-correction and assembly polishing with multiple sequence alignment.可扩展的长读自我纠错和多重序列比对的组装优化。
Sci Rep. 2021 Jan 12;11(1):761. doi: 10.1038/s41598-020-80757-5.
8
Efficient assembly of nanopore reads via highly accurate and intact error correction.通过高度准确和完整的纠错实现纳米孔读取的高效组装。
Nat Commun. 2021 Jan 4;12(1):60. doi: 10.1038/s41467-020-20236-7.
9
Fast and accurate long-read assembly with wtdbg2.使用 wtdbg2 实现快速准确的长读长序列组装。
Nat Methods. 2020 Feb;17(2):155-158. doi: 10.1038/s41592-019-0669-3. Epub 2019 Dec 9.
10
A benchmark study of k-mer counting methods for high-throughput sequencing.用于高通量测序的 k-mer 计数方法的基准研究。
Gigascience. 2018 Dec 1;7(12):giy125. doi: 10.1093/gigascience/giy125.