He Lily, Huang Mochao, Yiming Gulinisha, Zhu Yi, Liu Ruowei, Chen Jinghan, Yau Stephen S T
School of Science, Beijing University of Civil Engineering and Architecture, Beijing, 102616, People's Republic of China.
Beijing Institute of Mathematical Sciences and Application, Beijing, 100084, People's Republic of China.
BMC Bioinformatics. 2025 Jul 9;26(1):170. doi: 10.1186/s12859-025-06152-x.
As eukaryotic organisms, fungi play a pivotal role within ecosystems and exert profound influences on agriculture, the pharmaceutical industry, and human health. The classification of fungi in databases has emerged as a crucial and complex issue in the field of biology. In this study, by leveraging the local distribution of k-mer in nucleotide sequences, we introduce a novel alignment-free method, denoted as k-mer SNV, to address this challenge. On a large fungi dataset including 120,140 sequences, our innovative approach has achieved remarkable success in predicting the taxonomic labels of fungi across six hierarchical taxonomic levels: phylum (99.52%), class (98.17%), order (97.20%), family (96.11%), genus (94.14%), and species (93.32%). The approach is also evaluated on the common Taxxi benchmark dataset. Based on these results, it has been convincingly demonstrated that the k-mer SNV method exhibits outstanding performance in processing large-scale fungal sequence data.
作为真核生物,真菌在生态系统中发挥着关键作用,并对农业、制药行业和人类健康产生深远影响。真菌在数据库中的分类已成为生物学领域一个至关重要且复杂的问题。在本研究中,通过利用核苷酸序列中k-mer的局部分布,我们引入了一种新的无比对方法,称为k-mer SNV,以应对这一挑战。在一个包含120,140个序列的大型真菌数据集上,我们的创新方法在预测真菌跨六个层次分类水平(门(99.52%)、纲(98.17%)、目(97.20%)、科(96.11%)、属(94.14%)和种(93.32%))的分类标签方面取得了显著成功。该方法也在常见的Taxxi基准数据集上进行了评估。基于这些结果,令人信服地证明了k-mer SNV方法在处理大规模真菌序列数据方面表现出卓越的性能。