CAKL：基因组学的交换代数k-mer学习

CAKL: Commutative algebra k-mer learning of genomics.

作者信息

Suwayyid Faisal, Hozumi Yuta, Feng Hongsong, Zia Mushal, Wee JunJie, Wei Guo-Wei

机构信息

Department of Mathematics, King Fahd University of Petroleum and Minerals, Dhahran 31261, KSA.

Department of Mathematics, Michigan State University, MI 48824, USA.

出版信息

ArXiv. 2025 Aug 13:arXiv:2508.09406v1.

PMID:40832044

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12364056/

Abstract

Despite the availability of various sequence analysis models, comparative genomic analysis remains a challenge in genomics, genetics, and phylogenetics. Commutative algebra, a fundamental tool in algebraic geometry and number theory, has rarely been used in data and biological sciences. In this study, we introduce commutative algebra k-mer learning (CAKL) as the first-ever nonlinear algebraic framework for analyzing genomic sequences. CAKL bridges between commutative algebra, algebraic topology, combinatorics, and machine learning to establish a new mathematical paradigm for comparative genomic analysis. We evaluate its effectiveness on three tasks-genetic variant identification, phylogenetic tree analysis, and viral genome classification-typically requiring alignment-based, alignment-free, and machine-learning approaches, respectively. Across eleven datasets, CAKL outperforms five state-of-the-art sequence analysis methods, particularly in viral classification, and maintains stable predictive accuracy as dataset size increases, underscoring its scalability and robustness. This work ushers in a new era in commutative algebraic data analysis and learning.

摘要

尽管有各种序列分析模型，但比较基因组分析在基因组学、遗传学和系统发育学中仍然是一项挑战。交换代数作为代数几何和数论中的一个基本工具，在数据和生物科学中很少被使用。在本研究中，我们引入交换代数k-mer学习（CAKL），这是首个用于分析基因组序列的非线性代数框架。CAKL在交换代数、代数拓扑、组合学和机器学习之间架起桥梁，为比较基因组分析建立了一种新的数学范式。我们在三个任务上评估其有效性——基因变异识别、系统发育树分析和病毒基因组分类，这些任务通常分别需要基于比对、无比对和机器学习方法。在11个数据集中，CAKL优于五种最先进的序列分析方法，特别是在病毒分类方面，并且随着数据集规模的增加保持稳定的预测准确性，突出了其可扩展性和稳健性。这项工作开创了交换代数数据分析和学习的新纪元。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

CAKL：基因组学的交换代数k-mer学习

CAKL: Commutative algebra k-mer learning of genomics.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

CAKL：基因组学的交换代数k-mer学习

CAKL: Commutative algebra k-mer learning of genomics.

作者信息

机构信息

出版信息

相似文献

本文引用的文献