Suppr超能文献

一种新的无比对方法:用于真菌分类的K-mer子序列自然向量(K-mer SNV)。

A new alignment-free method: K-mer Subsequence Natural Vector (K-mer SNV) for classification of fungi.

作者信息

He Lily, Huang Mochao, Yiming Gulinisha, Zhu Yi, Liu Ruowei, Chen Jinghan, Yau Stephen S T

机构信息

School of Science, Beijing University of Civil Engineering and Architecture, Beijing, 102616, People's Republic of China.

Beijing Institute of Mathematical Sciences and Application, Beijing, 100084, People's Republic of China.

出版信息

BMC Bioinformatics. 2025 Jul 9;26(1):170. doi: 10.1186/s12859-025-06152-x.

Abstract

As eukaryotic organisms, fungi play a pivotal role within ecosystems and exert profound influences on agriculture, the pharmaceutical industry, and human health. The classification of fungi in databases has emerged as a crucial and complex issue in the field of biology. In this study, by leveraging the local distribution of k-mer in nucleotide sequences, we introduce a novel alignment-free method, denoted as k-mer SNV, to address this challenge. On a large fungi dataset including 120,140 sequences, our innovative approach has achieved remarkable success in predicting the taxonomic labels of fungi across six hierarchical taxonomic levels: phylum (99.52%), class (98.17%), order (97.20%), family (96.11%), genus (94.14%), and species (93.32%). The approach is also evaluated on the common Taxxi benchmark dataset. Based on these results, it has been convincingly demonstrated that the k-mer SNV method exhibits outstanding performance in processing large-scale fungal sequence data.

摘要

作为真核生物,真菌在生态系统中发挥着关键作用,并对农业、制药行业和人类健康产生深远影响。真菌在数据库中的分类已成为生物学领域一个至关重要且复杂的问题。在本研究中,通过利用核苷酸序列中k-mer的局部分布,我们引入了一种新的无比对方法,称为k-mer SNV,以应对这一挑战。在一个包含120,140个序列的大型真菌数据集上,我们的创新方法在预测真菌跨六个层次分类水平(门(99.52%)、纲(98.17%)、目(97.20%)、科(96.11%)、属(94.14%)和种(93.32%))的分类标签方面取得了显著成功。该方法也在常见的Taxxi基准数据集上进行了评估。基于这些结果,令人信服地证明了k-mer SNV方法在处理大规模真菌序列数据方面表现出卓越的性能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验