• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

GreedyMini:生成低密度DNA最小化子

GreedyMini: generating low-density DNA minimizers.

作者信息

Golan Shay, Tziony Ido, Kraus Matan, Orenstein Yaron, Shur Arseny

机构信息

Department of Computer Science, University of Haifa, Haifa 3498838, Israel.

Efi Arazi School of Computer Science, Reichman University, Herzliya 4610101, Israel.

出版信息

Bioinformatics. 2025 Jul 1;41(Supplement_1):i275-i284. doi: 10.1093/bioinformatics/btaf251.

DOI:10.1093/bioinformatics/btaf251
PMID:40662840
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12261476/
Abstract

MOTIVATION

Minimizers are the most popular k-mer selection scheme in algorithms and data structures analyzing high-throughput sequencing (HTS) data. In a minimizer scheme, the smallest k-mer by some predefined order is selected as the representative of a sequence window containing w consecutive k-mers, which results in overlapping windows often selecting the same k-mer. Minimizers that achieve the lowest frequency of selected k-mers over a random DNA sequence, termed the expected density, are desired for improved performance of HTS analyses. Yet, no method to date exists to generate minimizers that achieve minimum expected density. Moreover, for k and w values used by common HTS algorithms and data structures, there is a gap between densities achieved by existing selection schemes and the theoretical lower bound.

RESULTS

We developed GreedyMini, a toolkit of methods to generate minimizers with low expected or particular density, to improve minimizers, to extend minimizers to larger alphabets, k, and w, and to measure the expected density of a given minimizer efficiently. We demonstrate over various combinations of k and w values, including those of popular HTS methods, that GreedyMini can generate DNA minimizers that achieve expected densities very close to the lower bound, and both expected and particular densities much lower compared to existing selection schemes. Moreover, we show that GreedyMini's k-mer rank-retrieval time is comparable to common k-mer hash functions. We expect GreedyMini to improve the performance of many HTS algorithms and data structures and advance the research of k-mer selection schemes.

AVAILABILITY AND IMPLEMENTATION

The toolkit, its source code, and precomputed minimizers for a variety of (k,w) pairs are available via https://github.com/OrensteinLab/GreedyMini.

摘要

动机

在分析高通量测序(HTS)数据的算法和数据结构中,最小化子是最流行的k-mer选择方案。在最小化子方案中,按某种预定义顺序选择的最小k-mer被选作包含w个连续k-mer的序列窗口的代表,这导致重叠窗口经常选择相同的k-mer。对于改进HTS分析的性能而言,期望在随机DNA序列上实现所选k-mer最低频率(称为期望密度)的最小化子。然而,迄今为止还没有方法来生成具有最小期望密度的最小化子。此外,对于常见HTS算法和数据结构所使用的k和w值,现有选择方案所实现的密度与理论下限之间存在差距。

结果

我们开发了GreedyMini,这是一个用于生成具有低期望密度或特定密度的最小化子、改进最小化子、将最小化子扩展到更大字母表、k和w以及有效测量给定最小化子期望密度的方法工具包。我们通过各种k和w值的组合(包括流行HTS方法的组合)证明,GreedyMini可以生成期望密度非常接近下限的DNA最小化子,并且与现有选择方案相比,期望密度和特定密度都要低得多。此外,我们表明GreedyMini 的k-mer排名检索时间与常见的k-mer哈希函数相当。我们期望GreedyMini能提高许多HTS算法和数据结构的性能,并推动k-mer选择方案的研究。

可用性和实现方式

该工具包、其源代码以及针对各种(k,w)对的预计算最小化子可通过https://github.com/OrensteinLab/GreedyMini获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc35/12261476/ed7b2bcaa8d0/btaf251f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc35/12261476/7b0845086e9d/btaf251f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc35/12261476/fd042820caa3/btaf251f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc35/12261476/8efee41d4e07/btaf251f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc35/12261476/a4bf2445495c/btaf251f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc35/12261476/ed7b2bcaa8d0/btaf251f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc35/12261476/7b0845086e9d/btaf251f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc35/12261476/fd042820caa3/btaf251f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc35/12261476/8efee41d4e07/btaf251f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc35/12261476/a4bf2445495c/btaf251f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc35/12261476/ed7b2bcaa8d0/btaf251f5.jpg

相似文献

1
GreedyMini: generating low-density DNA minimizers.GreedyMini:生成低密度DNA最小化子
Bioinformatics. 2025 Jul 1;41(Supplement_1):i275-i284. doi: 10.1093/bioinformatics/btaf251.
2
A near-tight lower bound on the density of forward sampling schemes.前向采样方案密度的一个近乎紧密的下界。
bioRxiv. 2024 Nov 19:2024.09.06.611668. doi: 10.1101/2024.09.06.611668.
3
Sketching Methods with Small Window Guarantee Using Minimum Decycling Sets.使用最小去环集保证小窗口的草图方法。
J Comput Biol. 2024 Jul;31(7):597-615. doi: 10.1089/cmb.2024.0544. Epub 2024 Jul 9.
4
Diagnostic test accuracy and cost-effectiveness of tests for codeletion of chromosomal arms 1p and 19q in people with glioma.染色体臂 1p 和 19q 缺失的检测在胶质瘤患者中的诊断准确性和成本效益。
Cochrane Database Syst Rev. 2022 Mar 2;3(3):CD013387. doi: 10.1002/14651858.CD013387.pub2.
5
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
6
Drugs for preventing postoperative nausea and vomiting in adults after general anaesthesia: a network meta-analysis.成人全身麻醉后预防术后恶心呕吐的药物:网状Meta分析
Cochrane Database Syst Rev. 2020 Oct 19;10(10):CD012859. doi: 10.1002/14651858.CD012859.pub2.
7
Systemic treatments for metastatic cutaneous melanoma.转移性皮肤黑色素瘤的全身治疗
Cochrane Database Syst Rev. 2018 Feb 6;2(2):CD011123. doi: 10.1002/14651858.CD011123.pub2.
8
The Black Book of Psychotropic Dosing and Monitoring.《精神药物剂量与监测黑皮书》
Psychopharmacol Bull. 2024 Jul 8;54(3):8-59.
9
Inhaled mannitol for cystic fibrosis.吸入用甘露醇治疗囊性纤维化。
Cochrane Database Syst Rev. 2018 Feb 9;2(2):CD008649. doi: 10.1002/14651858.CD008649.pub3.
10
Acupuncture for treating overactive bladder in adults.针刺治疗成人膀胱过度活动症。
Cochrane Database Syst Rev. 2022 Sep 23;9(9):CD013519. doi: 10.1002/14651858.CD013519.pub2.

引用本文的文献

1
A near-tight lower bound on the density of forward sampling schemes.前向采样方案密度的一个近乎紧密的下界。
bioRxiv. 2024 Nov 19:2024.09.06.611668. doi: 10.1101/2024.09.06.611668.

本文引用的文献

1
The open-closed mod-minimizer algorithm.开闭模极小化算法。
Algorithms Mol Biol. 2025 Mar 17;20(1):4. doi: 10.1186/s13015-025-00270-0.
2
A near-tight lower bound on the density of forward sampling schemes.前向采样方案密度的一个近乎紧密的下界。
Bioinformatics. 2024 Dec 26;41(1). doi: 10.1093/bioinformatics/btae736.
3
When less is more: sketching with minimizers in genomics.少即是多:基因组学中的最小化器草图。
Genome Biol. 2024 Oct 14;25(1):270. doi: 10.1186/s13059-024-03414-4.
4
A survey of k-mer methods and applications in bioinformatics.生物信息学中k-mer方法及其应用综述。
Comput Struct Biotechnol J. 2024 May 21;23:2289-2303. doi: 10.1016/j.csbj.2024.05.025. eCollection 2024 Dec.
5
A Randomized Parallel Algorithm for Efficiently Finding Near-Optimal Universal Hitting Sets.一种用于高效找到近似最优通用命中集的随机并行算法。
Res Comput Mol Biol. 2020 May;12074:37-53. doi: 10.1007/978-3-030-45257-5_3. Epub 2020 Apr 21.
6
Creating and Using Minimizer Sketches in Computational Genomics.在计算基因组学中创建和使用最小草图。
J Comput Biol. 2023 Dec;30(12):1251-1276. doi: 10.1089/cmb.2023.0094. Epub 2023 Aug 30.
7
Efficient minimizer orders for large values of using minimum decycling sets.利用最小去环集对大 值 进行有效最小化排序。
Genome Res. 2023 Jul;33(7):1154-1161. doi: 10.1101/gr.277644.123. Epub 2023 Aug 9.
8
Differentiable Learning of Sequence-Specific Minimizer Schemes with DeepMinimizer.使用 DeepMinimizer 进行序列特异性最小化方案的可微学习。
J Comput Biol. 2022 Dec;29(12):1288-1304. doi: 10.1089/cmb.2022.0275. Epub 2022 Sep 12.
9
Sparse and skew hashing of K-mers.K- -mer 的稀疏和偏斜哈希。
Bioinformatics. 2022 Jun 24;38(Suppl 1):i185-i194. doi: 10.1093/bioinformatics/btac245.
10
Pangenomics enables genotyping of known structural variants in 5202 diverse genomes.泛基因组学能够对 5202 个不同基因组中的已知结构变异进行基因分型。
Science. 2021 Dec 17;374(6574):abg8871. doi: 10.1126/science.abg8871.