• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

生物信息学中k-mer方法及其应用综述。

A survey of k-mer methods and applications in bioinformatics.

作者信息

Moeckel Camille, Mareboina Manvita, Konnaris Maxwell A, Chan Candace S Y, Mouratidis Ioannis, Montgomery Austin, Chantzi Nikol, Pavlopoulos Georgios A, Georgakopoulos-Soares Ilias

机构信息

Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA.

Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA.

出版信息

Comput Struct Biotechnol J. 2024 May 21;23:2289-2303. doi: 10.1016/j.csbj.2024.05.025. eCollection 2024 Dec.

DOI:10.1016/j.csbj.2024.05.025
PMID:38840832
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11152613/
Abstract

The rapid progression of genomics and proteomics has been driven by the advent of advanced sequencing technologies, large, diverse, and readily available omics datasets, and the evolution of computational data processing capabilities. The vast amount of data generated by these advancements necessitates efficient algorithms to extract meaningful information. K-mers serve as a valuable tool when working with large sequencing datasets, offering several advantages in computational speed and memory efficiency and carrying the potential for intrinsic biological functionality. This review provides an overview of the methods, applications, and significance of k-mers in genomic and proteomic data analyses, as well as the utility of absent sequences, including nullomers and nullpeptides, in disease detection, vaccine development, therapeutics, and forensic science. Therefore, the review highlights the pivotal role of k-mers in addressing current genomic and proteomic problems and underscores their potential for future breakthroughs in research.

摘要

先进测序技术的出现、大规模、多样化且易于获取的组学数据集以及计算数据处理能力的发展推动了基因组学和蛋白质组学的快速发展。这些进展产生的大量数据需要高效算法来提取有意义的信息。k-mer在处理大型测序数据集时是一种有价值的工具,在计算速度和内存效率方面具有多个优势,并具有内在生物学功能的潜力。本综述概述了k-mer在基因组和蛋白质组数据分析中的方法、应用和意义,以及缺失序列(包括零聚体和零肽)在疾病检测、疫苗开发、治疗学和法医学中的效用。因此,本综述强调了k-mer在解决当前基因组和蛋白质组问题中的关键作用,并强调了它们在未来研究中取得突破的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dafc/11152613/a6eec6f7a6a5/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dafc/11152613/f7f53a673589/ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dafc/11152613/d86dc689c170/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dafc/11152613/75b5ebdd2e52/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dafc/11152613/a6eec6f7a6a5/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dafc/11152613/f7f53a673589/ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dafc/11152613/d86dc689c170/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dafc/11152613/75b5ebdd2e52/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dafc/11152613/a6eec6f7a6a5/gr3.jpg

相似文献

1
A survey of k-mer methods and applications in bioinformatics.生物信息学中k-mer方法及其应用综述。
Comput Struct Biotechnol J. 2024 May 21;23:2289-2303. doi: 10.1016/j.csbj.2024.05.025. eCollection 2024 Dec.
2
Absent from DNA and protein: genomic characterization of nullomers and nullpeptides across functional categories and evolution.缺失于 DNA 和蛋白质:跨越功能类别和进化的无义寡聚物和无义肽的基因组特征。
Genome Biol. 2021 Aug 25;22(1):245. doi: 10.1186/s13059-021-02459-z.
3
SAKE: Strobemer-assisted k-mer extraction.SAKE:频闪辅助 k-mer 提取。
PLoS One. 2023 Nov 29;18(11):e0294415. doi: 10.1371/journal.pone.0294415. eCollection 2023.
4
Effective sequence similarity detection with strobemers.利用频闪体进行有效的序列相似性检测。
Genome Res. 2021 Nov;31(11):2080-2094. doi: 10.1101/gr.275648.121. Epub 2021 Oct 19.
5
Methods for Pangenomic Core Detection.泛基因组核心检测方法。
Methods Mol Biol. 2024;2802:73-106. doi: 10.1007/978-1-0716-3838-5_4.
6
Estimating the -mer Coverage Frequencies in Genomic Datasets: A Comparative Assessment of the State-of-the-art.估算基因组数据集中的-mer覆盖频率:对当前技术水平的比较评估。
Curr Genomics. 2019 Jan;20(1):2-15. doi: 10.2174/1389202919666181026101326.
7
These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure.这些不是你要找的k-mer:使用概率数据结构进行高效在线k-mer计数。
PLoS One. 2014 Jul 25;9(7):e101271. doi: 10.1371/journal.pone.0101271. eCollection 2014.
8
kmer2vec: A Novel Method for Comparing DNA Sequences by word2vec Embedding.kmer2vec:一种基于 word2vec 嵌入的 DNA 序列比较新方法。
J Comput Biol. 2022 Sep;29(9):1001-1021. doi: 10.1089/cmb.2021.0536. Epub 2022 May 20.
9
ntCard: a streaming algorithm for cardinality estimation in genomics data.ntCard:一种用于基因组数据基数估计的流算法。
Bioinformatics. 2017 May 1;33(9):1324-1330. doi: 10.1093/bioinformatics/btw832.
10
Fast Approximation of Frequent -Mers and Applications to Metagenomics.频繁短序列模式的快速近似算法及其在宏基因组学中的应用
J Comput Biol. 2020 Apr;27(4):534-549. doi: 10.1089/cmb.2019.0314. Epub 2019 Dec 20.

引用本文的文献

1
Multimodal Deep Learning for Generating Potential Anti-Dengue Peptides.用于生成潜在抗登革热肽的多模态深度学习
ACS Omega. 2025 Aug 19;10(34):38653-38674. doi: 10.1021/acsomega.5c03510. eCollection 2025 Sep 2.
2
Taxonomic quasi-primes: peptides charting lineage-specific adaptations and disease-relevant loci.分类学准素:描绘谱系特异性适应性和疾病相关基因座的肽段。
Protein Sci. 2025 Sep;34(9):e70241. doi: 10.1002/pro.70241.
3
Genomic language models with k-mer tokenization strategies for plant genome annotation and regulatory element strength prediction.

本文引用的文献

1
kmerDB: A database encompassing the set of genomic and proteomic sequence information for each species.kmer数据库:一个包含每个物种基因组和蛋白质组序列信息集合的数据库。
Comput Struct Biotechnol J. 2024 Apr 21;23:1919-1928. doi: 10.1016/j.csbj.2024.04.050. eCollection 2024 Dec.
2
Genome assembly in the telomere-to-telomere era.端粒到端粒时代的基因组组装。
Nat Rev Genet. 2024 Sep;25(9):658-670. doi: 10.1038/s41576-024-00718-w. Epub 2024 Apr 22.
3
The determinants of the rarity of nucleic and peptide short sequences in nature.
用于植物基因组注释和调控元件强度预测的采用k-mer分词策略的基因组语言模型。
Plant Mol Biol. 2025 Jul 31;115(4):100. doi: 10.1007/s11103-025-01604-7.
4
Novel insecticide resistance mutations associated with variable PBO synergy in Anopheles gambiae s.l. from the Democratic Republic of Congo.与刚果民主共和国冈比亚按蚊复合种中不同增效醚协同作用相关的新型杀虫剂抗性突变
Sci Rep. 2025 Jul 29;15(1):27618. doi: 10.1038/s41598-025-09016-9.
5
Ubigo-X: Protein ubiquitination site prediction using ensemble learning with image-based feature representation and weighted voting.Ubigo-X:基于集成学习、利用基于图像的特征表示和加权投票进行蛋白质泛素化位点预测
Comput Struct Biotechnol J. 2025 Jul 14;27:3137-3146. doi: 10.1016/j.csbj.2025.07.025. eCollection 2025.
6
GreedyMini: generating low-density DNA minimizers.GreedyMini:生成低密度DNA最小化子
Bioinformatics. 2025 Jul 1;41(Supplement_1):i275-i284. doi: 10.1093/bioinformatics/btaf251.
7
Identification of DNA N6-methyladenine modifications in the rice genome with a fine-tuned large language model.利用微调的大语言模型鉴定水稻基因组中的DNA N6-甲基腺嘌呤修饰
Front Plant Sci. 2025 Jun 25;16:1626539. doi: 10.3389/fpls.2025.1626539. eCollection 2025.
8
MFH-LPI: based on multi-view similarity networks fusion and hypergraph learning for long non-coding RNA-protein interactions prediction.MFH-LPI:基于多视图相似性网络融合和超图学习的长链非编码RNA-蛋白质相互作用预测
BMC Genomics. 2025 Jul 1;26(1):597. doi: 10.1186/s12864-025-11774-9.
9
Poplar: a phylogenomics pipeline.杨树:一种系统发育基因组学流程。
Bioinform Adv. 2025 May 6;5(1):vbaf104. doi: 10.1093/bioadv/vbaf104. eCollection 2025.
10
MAFcounter: an efficient tool for counting the occurrences of k-mers in MAF files.MAFcounter:一种用于统计MAF文件中k-mer出现次数的高效工具。
BMC Bioinformatics. 2025 May 30;26(1):142. doi: 10.1186/s12859-025-06172-7.
自然界中核酸和肽短序列稀有性的决定因素。
NAR Genom Bioinform. 2024 Apr 4;6(2):lqae029. doi: 10.1093/nargab/lqae029. eCollection 2024 Jun.
4
Genome-wide repeat landscapes in cancer and cell-free DNA.癌症和游离 DNA 中的全基因组重复景观。
Sci Transl Med. 2024 Mar 13;16(738):eadj9283. doi: 10.1126/scitranslmed.adj9283.
5
Utilizing nullomers in cell-free RNA for early cancer detection.利用无细胞 RNA 中的无义寡核苷酸进行早期癌症检测。
Cancer Gene Ther. 2024 Jun;31(6):861-870. doi: 10.1038/s41417-024-00741-3. Epub 2024 Feb 14.
6
Illumina reads correction: evaluation and improvements.Illumina测序读数校正:评估与改进
Sci Rep. 2024 Jan 26;14(1):2232. doi: 10.1038/s41598-024-52386-9.
7
YACHT: an ANI-based statistical test to detect microbial presence/absence in a metagenomic sample.YACHT:一种基于ANI 的统计测试,用于检测宏基因组样本中的微生物存在/缺失。
Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae047.
8
9S1R nullomer peptide induces mitochondrial pathology, metabolic suppression, and enhanced immune cell infiltration, in triple-negative breast cancer mouse model.9S1R 缺失肽诱导三阴性乳腺癌小鼠模型中线粒体病理、代谢抑制和增强的免疫细胞浸润。
Biomed Pharmacother. 2024 Jan;170:115997. doi: 10.1016/j.biopha.2023.115997. Epub 2023 Dec 20.
9
Frequentmers - a novel way to look at metagenomic next generation sequencing data and an application in detecting liver cirrhosis.频繁出现的种属 - 一种新颖的宏基因组下一代测序数据分析方法及其在检测肝硬化中的应用。
BMC Genomics. 2023 Dec 12;24(1):768. doi: 10.1186/s12864-023-09861-w.
10
Peptide absent sequences emerging in human cancers.人类癌症中出现的肽缺失序列。
Eur J Cancer. 2024 Jan;196:113421. doi: 10.1016/j.ejca.2023.113421. Epub 2023 Nov 7.