• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用基于k-mer的K近邻模型进行物种注释。

Species annotation using a k-mer based KNN model.

作者信息

Sangar Srushti, Kolage Prathamesh, Chunarkar-Patil Pritee

机构信息

Department of Bioinformatics, Rajiv Gandhi Institute of IT and Biotechnology, Bharati Vidyapeeth (Deemed to be University), Pune, Maharashtra, India.

出版信息

Bioinformation. 2024 Sep 30;20(9):986-989. doi: 10.6026/973206300200986. eCollection 2024.

DOI:10.6026/973206300200986
PMID:39917243
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11795478/
Abstract

Bacterial identification is a critical process in microbiology, clinical diagnostics, environmental monitoring, and food safety. Machine learning holds great promise for improving bacterial identification by increasing accuracy, speed, and scalability. However, challenges such as data dependency, model interpretability, and computational demands must be addressed to fully realize it's potential. k-mer based bacterial identification algorithm is an attempt to address these issues. Sequence matching is completed using the KNN technique. This included feature extraction, dataset preparation, classifier training, and label prediction based on k-mer frequency distribution similarity. The algorithm's performance has been cross-checked through accuracy assessment metrics such as F1 score and precision with an impressive 93% accuracy rate.

摘要

细菌鉴定是微生物学、临床诊断、环境监测和食品安全中的关键过程。机器学习在提高细菌鉴定的准确性、速度和可扩展性方面具有巨大潜力。然而,要充分发挥其潜力,必须解决数据依赖性、模型可解释性和计算需求等挑战。基于k-mer的细菌鉴定算法试图解决这些问题。使用KNN技术完成序列匹配。这包括特征提取、数据集准备、分类器训练以及基于k-mer频率分布相似性的标签预测。该算法的性能已通过F1分数和精确率等准确性评估指标进行交叉检验,准确率高达93%,令人印象深刻。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5fe/11795478/8ad773375cec/973206300200986F4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5fe/11795478/426047755f97/973206300200986F1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5fe/11795478/59d737f4aa86/973206300200986F2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5fe/11795478/bc02c9248026/973206300200986F3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5fe/11795478/8ad773375cec/973206300200986F4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5fe/11795478/426047755f97/973206300200986F1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5fe/11795478/59d737f4aa86/973206300200986F2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5fe/11795478/bc02c9248026/973206300200986F3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5fe/11795478/8ad773375cec/973206300200986F4.jpg

相似文献

1
Species annotation using a k-mer based KNN model.使用基于k-mer的K近邻模型进行物种注释。
Bioinformation. 2024 Sep 30;20(9):986-989. doi: 10.6026/973206300200986. eCollection 2024.
2
COVID-19 diagnosis: A comprehensive review of pre-trained deep learning models based on feature extraction algorithm.COVID-19诊断:基于特征提取算法的预训练深度学习模型综合综述
Results Eng. 2023 Jun;18:101020. doi: 10.1016/j.rineng.2023.101020. Epub 2023 Mar 16.
3
Plasmer: an Accurate and Sensitive Bacterial Plasmid Prediction Tool Based on Machine Learning of Shared k-mers and Genomic Features.Plasmer:一种基于共享 k-mers 和基因组特征的机器学习的准确且灵敏的细菌质粒预测工具。
Microbiol Spectr. 2023 Jun 15;11(3):e0464522. doi: 10.1128/spectrum.04645-22. Epub 2023 May 16.
4
Machine learning algorithms for predicting COVID-19 mortality in Ethiopia.用于预测埃塞俄比亚 COVID-19 死亡率的机器学习算法。
BMC Public Health. 2024 Jun 28;24(1):1728. doi: 10.1186/s12889-024-19196-0.
5
Study on the semi-supervised learning-based patient similarity from heterogeneous electronic medical records.基于半监督学习的异质电子病历中患者相似性研究。
BMC Med Inform Decis Mak. 2021 Jul 30;21(Suppl 2):58. doi: 10.1186/s12911-021-01432-x.
6
Refining heart disease prediction accuracy using hybrid machine learning techniques with novel metaheuristic algorithms.利用具有新颖元启发式算法的混合机器学习技术提高心脏病预测准确性。
Int J Cardiol. 2024 Dec 1;416:132506. doi: 10.1016/j.ijcard.2024.132506. Epub 2024 Aug 30.
7
AVNM: A Voting based Novel Mathematical Rule for Image Classification.AVNM:一种基于投票的图像分类新数学规则。
Comput Methods Programs Biomed. 2016 Dec;137:195-201. doi: 10.1016/j.cmpb.2016.08.015. Epub 2016 Sep 26.
8
A Comparative Analysis of Machine-Learning Algorithms for Automated International Classification of Diseases (ICD)-10 Coding in Malaysian Death Records.马来西亚死亡记录中用于自动国际疾病分类(ICD)-10编码的机器学习算法的比较分析
Cureus. 2025 Jan 12;17(1):e77342. doi: 10.7759/cureus.77342. eCollection 2025 Jan.
9
Microbiome-based classification models for fresh produce safety and quality evaluation.基于微生物组的分类模型在新鲜农产品安全和质量评价中的应用。
Microbiol Spectr. 2024 Apr 2;12(4):e0344823. doi: 10.1128/spectrum.03448-23. Epub 2024 Mar 6.
10
A Novel Forward-Propagation Workflow Assessment Method for Malicious Packet Detection.一种用于恶意数据包检测的新型前向传播工作流评估方法。
Sensors (Basel). 2022 May 30;22(11):4167. doi: 10.3390/s22114167.

本文引用的文献

1
PanKmer: k-mer-based and reference-free pangenome analysis.PanKmer:基于 k-mer 的无参考基因组泛基因组分析。
Bioinformatics. 2023 Oct 3;39(10). doi: 10.1093/bioinformatics/btad621.
2
-mer-Based Genome-Wide Association Studies in Plants: Advances, Challenges, and Perspectives.基于代谢组的植物全基因组关联研究:进展、挑战与展望。
Genes (Basel). 2023 Jul 13;14(7):1439. doi: 10.3390/genes14071439.
3
Large-scale k-mer-based analysis of the informational properties of genomes, comparative genomics and taxonomy.大规模基于 k-mer 的基因组信息特性分析、比较基因组学和分类学。
PLoS One. 2021 Oct 14;16(10):e0258693. doi: 10.1371/journal.pone.0258693. eCollection 2021.
4
iMOKA: k-mer based software to analyze large collections of sequencing data.iMOKA:基于 k-mer 的软件,用于分析大量测序数据。
Genome Biol. 2020 Oct 13;21(1):261. doi: 10.1186/s13059-020-02165-2.
5
A k-mer-based method for the identification of phenotype-associated genomic biomarkers and predicting phenotypes of sequenced bacteria.基于 k-mer 的方法鉴定与表型相关的基因组生物标志物并预测测序细菌的表型。
PLoS Comput Biol. 2018 Oct 22;14(10):e1006434. doi: 10.1371/journal.pcbi.1006434. eCollection 2018 Oct.
6
BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches.生物序列分析:一个基于机器学习方法的 DNA、RNA 和蛋白质序列分析平台。
Brief Bioinform. 2019 Jul 19;20(4):1280-1294. doi: 10.1093/bib/bbx165.
7
The Identification of Discriminating Patterns from 16S rRNA Gene to Generate Signature for Bacillus Genus.从16S rRNA基因中识别区分模式以生成芽孢杆菌属的特征标记
J Comput Biol. 2016 Aug;23(8):651-61. doi: 10.1089/cmb.2016.0002. Epub 2016 Apr 22.
8
Machine learning applications in genetics and genomics.机器学习在遗传学和基因组学中的应用。
Nat Rev Genet. 2015 Jun;16(6):321-32. doi: 10.1038/nrg3920. Epub 2015 May 7.
9
PhymmBL expanded: confidence scores, custom databases, parallelization and more.PhymmBL扩展:置信度得分、自定义数据库、并行化等等。
Nat Methods. 2011 May;8(5):367. doi: 10.1038/nmeth0511-367.
10
Metagenome fragment classification using N-mer frequency profiles.使用N-mer频率谱进行宏基因组片段分类。
Adv Bioinformatics. 2008;2008:205969. doi: 10.1155/2008/205969. Epub 2008 Nov 16.