• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

CAKL:基因组学的交换代数k-mer学习

CAKL: Commutative algebra k-mer learning of genomics.

作者信息

Suwayyid Faisal, Hozumi Yuta, Feng Hongsong, Zia Mushal, Wee JunJie, Wei Guo-Wei

机构信息

Department of Mathematics, King Fahd University of Petroleum and Minerals, Dhahran 31261, KSA.

Department of Mathematics, Michigan State University, MI 48824, USA.

出版信息

ArXiv. 2025 Aug 13:arXiv:2508.09406v1.

PMID:40832044
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12364056/
Abstract

Despite the availability of various sequence analysis models, comparative genomic analysis remains a challenge in genomics, genetics, and phylogenetics. Commutative algebra, a fundamental tool in algebraic geometry and number theory, has rarely been used in data and biological sciences. In this study, we introduce commutative algebra k-mer learning (CAKL) as the first-ever nonlinear algebraic framework for analyzing genomic sequences. CAKL bridges between commutative algebra, algebraic topology, combinatorics, and machine learning to establish a new mathematical paradigm for comparative genomic analysis. We evaluate its effectiveness on three tasks-genetic variant identification, phylogenetic tree analysis, and viral genome classification-typically requiring alignment-based, alignment-free, and machine-learning approaches, respectively. Across eleven datasets, CAKL outperforms five state-of-the-art sequence analysis methods, particularly in viral classification, and maintains stable predictive accuracy as dataset size increases, underscoring its scalability and robustness. This work ushers in a new era in commutative algebraic data analysis and learning.

摘要

尽管有各种序列分析模型,但比较基因组分析在基因组学、遗传学和系统发育学中仍然是一项挑战。交换代数作为代数几何和数论中的一个基本工具,在数据和生物科学中很少被使用。在本研究中,我们引入交换代数k-mer学习(CAKL),这是首个用于分析基因组序列的非线性代数框架。CAKL在交换代数、代数拓扑、组合学和机器学习之间架起桥梁,为比较基因组分析建立了一种新的数学范式。我们在三个任务上评估其有效性——基因变异识别、系统发育树分析和病毒基因组分类,这些任务通常分别需要基于比对、无比对和机器学习方法。在11个数据集中,CAKL优于五种最先进的序列分析方法,特别是在病毒分类方面,并且随着数据集规模的增加保持稳定的预测准确性,突出了其可扩展性和稳健性。这项工作开创了交换代数数据分析和学习的新纪元。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d2e9/12364056/af9560b529f7/nihpp-2508.09406v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d2e9/12364056/236f124c96f7/nihpp-2508.09406v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d2e9/12364056/892e0818e7f6/nihpp-2508.09406v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d2e9/12364056/ab1050e6f91d/nihpp-2508.09406v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d2e9/12364056/af9560b529f7/nihpp-2508.09406v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d2e9/12364056/236f124c96f7/nihpp-2508.09406v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d2e9/12364056/892e0818e7f6/nihpp-2508.09406v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d2e9/12364056/ab1050e6f91d/nihpp-2508.09406v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d2e9/12364056/af9560b529f7/nihpp-2508.09406v1-f0004.jpg

相似文献

1
CAKL: Commutative algebra k-mer learning of genomics.CAKL:基因组学的交换代数k-mer学习
ArXiv. 2025 Aug 13:arXiv:2508.09406v1.
2
CAML: Commutative Algebra Machine Learning─A Case Study on Protein-Ligand Binding Affinity Prediction.CAML:交换代数机器学习——蛋白质-配体结合亲和力预测的案例研究
J Chem Inf Model. 2025 Jul 14;65(13):6732-6743. doi: 10.1021/acs.jcim.5c00940. Epub 2025 Jun 15.
3
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
4
Integrating sequence composition information into microbial diversity analyses with k-mer frequency counting.通过k-mer频率计数将序列组成信息整合到微生物多样性分析中。
mSystems. 2025 Mar 18;10(3):e0155024. doi: 10.1128/msystems.01550-24. Epub 2025 Feb 20.
5
Leveraging a foundation model zoo for cell similarity search in oncological microscopy across devices.利用基础模型库进行跨设备肿瘤显微镜检查中的细胞相似性搜索。
Front Oncol. 2025 Jun 18;15:1480384. doi: 10.3389/fonc.2025.1480384. eCollection 2025.
6
Exploring the Potential of Electroencephalography Signal-Based Image Generation Using Diffusion Models: Integrative Framework Combining Mixed Methods and Multimodal Analysis.利用扩散模型探索基于脑电图信号的图像生成潜力:结合混合方法和多模态分析的综合框架
JMIR Med Inform. 2025 Jun 25;13:e72027. doi: 10.2196/72027.
7
Some results about the structural properties of the Wnt pathway, its steady states and its non-associative commutative algebra.关于Wnt信号通路的结构特性、其稳态及其非结合交换代数的一些结果。
Math Med Biol. 2025 Aug 11. doi: 10.1093/imammb/dqaf008.
8
Efficient Storage and Analysis of Genomic Data: A k-mer Frequency Mapping and Image Representation Method.基因组数据的高效存储与分析:一种k-mer频率映射与图像表示方法。
Interdiscip Sci. 2024 Oct 21. doi: 10.1007/s12539-024-00659-2.
9
A medical image classification method based on self-regularized adversarial learning.基于自正则化对抗学习的医学图像分类方法。
Med Phys. 2024 Nov;51(11):8232-8246. doi: 10.1002/mp.17320. Epub 2024 Jul 30.
10
PRCFX-DT: a new graph-based approach for feature selection and classification of genomic sequences.PRCFX-DT:一种基于图形的基因组序列特征选择与分类新方法。
BMC Bioinformatics. 2025 Jun 17;26(1):159. doi: 10.1186/s12859-025-06183-4.

本文引用的文献

1
The optimal metric for viral genome space.病毒基因组空间的最佳指标。
Comput Struct Biotechnol J. 2024 May 10;23:2083-2096. doi: 10.1016/j.csbj.2024.05.005. eCollection 2024 Dec.
2
Interactive Tree of Life (iTOL) v6: recent updates to the phylogenetic tree display and annotation tool.交互式生命树 (iTOL) v6:系统发育树显示和注释工具的最新更新。
Nucleic Acids Res. 2024 Jul 5;52(W1):W78-W82. doi: 10.1093/nar/gkae268.
3
Omicron BA.2 (B.1.1.529.2): High Potential for Becoming the Next Dominant Variant.奥密克戎 BA.2(B.1.1.529.2):成为下一个优势变种的高潜力。
J Phys Chem Lett. 2022 May 5;13(17):3840-3849. doi: 10.1021/acs.jpclett.2c00469. Epub 2022 Apr 25.
4
The biological and clinical significance of emerging SARS-CoV-2 variants.新兴 SARS-CoV-2 变体的生物学和临床意义。
Nat Rev Genet. 2021 Dec;22(12):757-773. doi: 10.1038/s41576-021-00408-x. Epub 2021 Sep 17.
5
Geometric construction of viral genome space and its applications.病毒基因组空间的几何构建及其应用。
Comput Struct Biotechnol J. 2021 Jul 27;19:4226-4234. doi: 10.1016/j.csbj.2021.07.028. eCollection 2021.
6
Recombination and lineage-specific mutations linked to the emergence of SARS-CoV-2.与 SARS-CoV-2 出现相关的重组和谱系特异性突变。
Genome Med. 2021 Aug 6;13(1):124. doi: 10.1186/s13073-021-00943-6.
7
Benchmarking of alignment-free sequence comparison methods.无比对信息的序列比较方法的基准测试。
Genome Biol. 2019 Jul 25;20(1):144. doi: 10.1186/s13059-019-1755-7.
8
Alignment-free method for DNA sequence clustering using Fuzzy integral similarity.基于模糊积分相似度的无比对 DNA 序列聚类方法。
Sci Rep. 2019 Mar 6;9(1):3753. doi: 10.1038/s41598-019-40452-6.
9
Alignment-free sequence comparison: benefits, applications, and tools.无比对信息的序列比对:优势、应用和工具。
Genome Biol. 2017 Oct 3;18(1):186. doi: 10.1186/s13059-017-1319-7.
10
Alignment-free microbial phylogenomics under scenarios of sequence divergence, genome rearrangement and lateral genetic transfer.无比对的微生物系统发生基因组学研究在序列分歧、基因组重排和水平基因转移情景下的应用。
Sci Rep. 2016 Jul 1;6:28970. doi: 10.1038/srep28970.