• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

MT-MAG:用于宏基因组组装基因组的完整或部分分类学分配的准确且可解释的机器学习。

MT-MAG: Accurate and interpretable machine learning for complete or partial taxonomic assignments of metagenomeassembled genomes.

机构信息

School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada.

Department of Biology, University of Waterloo, Waterloo, Ontario, Canada.

出版信息

PLoS One. 2023 Aug 18;18(8):e0283536. doi: 10.1371/journal.pone.0283536. eCollection 2023.

DOI:10.1371/journal.pone.0283536
PMID:37594964
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10437822/
Abstract

We propose MT-MAG, a novel machine learning-based software tool for the complete or partial hierarchically-structured taxonomic classification of metagenome-assembled genomes (MAGs). MT-MAG is alignment-free, with k-mer frequencies being the only feature used to distinguish a DNA sequence from another (herein k = 7). MT-MAG is capable of classifying large and diverse metagenomic datasets: a total of 245.68 Gbp in the training sets, and 9.6 Gbp in the test sets analyzed in this study. In addition to complete classifications, MT-MAG offers a "partial classification" option, whereby a classification at a higher taxonomic level is provided for MAGs that cannot be classified to the Species level. MT-MAG outputs complete or partial classification paths, and interpretable numerical classification confidences of its classifications, at all taxonomic ranks. To assess the performance of MT-MAG, we define a "weighted classification accuracy," with a weighting scheme reflecting the fact that partial classifications at different ranks are not equally informative. For the two benchmarking datasets analyzed (genomes from human gut microbiome species, and bacterial and archaeal genomes assembled from cow rumen metagenomic sequences), MT-MAG achieves an average of 87.32% in weighted classification accuracy. At the Species level, MT-MAG outperforms DeepMicrobes, the only other comparable software tool, by an average of 34.79% in weighted classification accuracy. In addition, MT-MAG is able to completely classify an average of 67.70% of the sequences at the Species level, compared with DeepMicrobes which only classifies 47.45%. Moreover, MT-MAG provides additional information for sequences that it could not classify at the Species level, resulting in the partial or complete classification of 95.13%, of the genomes in the datasets analyzed. Lastly, unlike other taxonomic assignment tools (e.g., GDTB-Tk), MT-MAG is an alignment-free and genetic marker-free tool, able to provide additional bioinformatics analysis to confirm existing or tentative taxonomic assignments.

摘要

我们提出了 MT-MAG,这是一种基于机器学习的新型软件工具,用于对宏基因组组装基因组(MAG)进行完全或部分层次结构分类。MT-MAG 是无比对的,仅使用 k-mer 频率作为区分 DNA 序列的特征(这里 k=7)。MT-MAG 能够对大型和多样化的宏基因组数据集进行分类:在本研究中分析的训练集中共有 245.68 Gbp,测试集中有 9.6 Gbp。除了完整的分类外,MT-MAG 还提供了“部分分类”选项,对于无法分类到种级别的 MAG,可以提供更高分类水平的分类。MT-MAG 输出完整或部分分类路径,以及在所有分类级别下可解释的分类置信度。为了评估 MT-MAG 的性能,我们定义了“加权分类准确率”,其中加权方案反映了不同等级的部分分类的信息量并不相等。对于分析的两个基准数据集(来自人类肠道微生物物种的基因组,以及从牛瘤胃宏基因组序列组装的细菌和古菌基因组),MT-MAG 在加权分类准确率方面平均达到 87.32%。在种级水平上,MT-MAG 的加权分类准确率平均比唯一可比的软件工具 DeepMicrobes 高出 34.79%。此外,MT-MAG 能够完全分类种级水平上平均 67.70%的序列,而 DeepMicrobes 只能分类 47.45%。此外,MT-MAG 为它无法在种级水平分类的序列提供了额外的信息,导致分析的数据集的基因组中有 95.13%得到了部分或完整的分类。最后,与其他分类分配工具(例如 GDTB-Tk)不同,MT-MAG 是一种无比对和无遗传标记的工具,能够提供额外的生物信息学分析来确认现有的或暂定的分类分配。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/26fc/10437822/df113c196450/pone.0283536.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/26fc/10437822/124916981c0a/pone.0283536.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/26fc/10437822/242a704271b9/pone.0283536.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/26fc/10437822/6f67e1ddf7f3/pone.0283536.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/26fc/10437822/df113c196450/pone.0283536.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/26fc/10437822/124916981c0a/pone.0283536.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/26fc/10437822/242a704271b9/pone.0283536.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/26fc/10437822/6f67e1ddf7f3/pone.0283536.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/26fc/10437822/df113c196450/pone.0283536.g004.jpg

相似文献

1
MT-MAG: Accurate and interpretable machine learning for complete or partial taxonomic assignments of metagenomeassembled genomes.MT-MAG:用于宏基因组组装基因组的完整或部分分类学分配的准确且可解释的机器学习。
PLoS One. 2023 Aug 18;18(8):e0283536. doi: 10.1371/journal.pone.0283536. eCollection 2023.
2
ML-DSP: Machine Learning with Digital Signal Processing for ultrafast, accurate, and scalable genome classification at all taxonomic levels.ML-DSP:利用数字信号处理进行机器学习,实现了在所有分类学水平上的超快、准确和可扩展的基因组分类。
BMC Genomics. 2019 Apr 3;20(1):267. doi: 10.1186/s12864-019-5571-y.
3
Investigating the impact of database choice on the accuracy of metagenomic read classification for the rumen microbiome.研究数据库选择对瘤胃微生物组宏基因组读数分类准确性的影响。
Anim Microbiome. 2022 Nov 18;4(1):57. doi: 10.1186/s42523-022-00207-7.
4
VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data.VirFinder:一种新型的基于 k-mer 的工具,用于从组装的宏基因组数据中识别病毒序列。
Microbiome. 2017 Jul 6;5(1):69. doi: 10.1186/s40168-017-0283-5.
5
A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy.一种用于16S rRNA基因序列的贝叶斯分类方法,具有更高的物种水平准确性。
BMC Bioinformatics. 2017 May 10;18(1):247. doi: 10.1186/s12859-017-1670-4.
6
A review of computational tools for generating metagenome-assembled genomes from metagenomic sequencing data.从宏基因组测序数据生成宏基因组组装基因组的计算工具综述。
Comput Struct Biotechnol J. 2021 Nov 23;19:6301-6314. doi: 10.1016/j.csbj.2021.11.028. eCollection 2021.
7
How to Obtain and Compare Metagenome-Assembled Genomes.如何获取和比较宏基因组组装基因组。
Methods Mol Biol. 2024;2802:135-163. doi: 10.1007/978-1-0716-3838-5_6.
8
MetaLab-MAG: A Metaproteomic Data Analysis Platform for Genome-Level Characterization of Microbiomes from the Metagenome-Assembled Genomes Database.MetaLab-MAG:一个元蛋白质组数据分析平台,用于从宏基因组组装基因组数据库中对微生物组进行基因组水平的特征描述。
J Proteome Res. 2023 Feb 3;22(2):387-398. doi: 10.1021/acs.jproteome.2c00554. Epub 2022 Dec 12.
9
MTSv: rapid alignment-based taxonomic classification and high-confidence metagenomic analysis.MTSv:快速基于比对的分类学分类和高置信度宏基因组分析。
PeerJ. 2022 Nov 8;10:e14292. doi: 10.7717/peerj.14292. eCollection 2022.
10
Cultivation-independent genomes greatly expand taxonomic-profiling capabilities of mOTUs across various environments.非培养基因组极大地扩展了 mOTU 在各种环境中的分类鉴定能力。
Microbiome. 2022 Dec 5;10(1):212. doi: 10.1186/s40168-022-01410-z.

引用本文的文献

1
Genome-resolved metagenomics from short-read sequencing data in the era of artificial intelligence.人工智能时代基于短读长测序数据的基因组解析宏基因组学
Funct Integr Genomics. 2025 Jun 10;25(1):124. doi: 10.1007/s10142-025-01625-x.
2
Bioinformatic approaches to blood and tissue microbiome analyses: challenges and perspectives.血液和组织微生物组分析的生物信息学方法:挑战与展望。
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf176.
3
MNBC: a multithreaded Minimizer-based Naïve Bayes Classifier for improved metagenomic sequence classification.

本文引用的文献

1
Taxonomic classification of DNA sequences beyond sequence similarity using deep neural networks.基于深度神经网络的 DNA 序列分类研究:超越序列相似性的分类方法
Proc Natl Acad Sci U S A. 2022 Aug 30;119(35):e2122636119. doi: 10.1073/pnas.2122636119. Epub 2022 Aug 26.
2
DeLUCS: Deep learning for unsupervised clustering of DNA sequences.DeLUCS:用于 DNA 序列无监督聚类的深度学习。
PLoS One. 2022 Jan 21;17(1):e0261531. doi: 10.1371/journal.pone.0261531. eCollection 2022.
3
Convex Calibrated Surrogates for the Multi-Label F-Measure.
MNBC:一种基于多线程 Minimizer 的朴素贝叶斯分类器,用于改进宏基因组序列分类。
Bioinformatics. 2024 Oct 1;40(10). doi: 10.1093/bioinformatics/btae601.
4
Using GWAS and Machine Learning to Identify and Predict Genetic Variants Associated with Foodborne Bacteria Phenotypic Traits.利用 GWAS 和机器学习识别和预测与食源性病原体表型特征相关的遗传变异。
Methods Mol Biol. 2025;2852:223-253. doi: 10.1007/978-1-0716-4100-2_16.
5
HiTaxon: a hierarchical ensemble framework for taxonomic classification of short reads.HiTaxon:一种用于短读段分类学分类的分层集成框架。
Bioinform Adv. 2024 Feb 1;4(1):vbae016. doi: 10.1093/bioadv/vbae016. eCollection 2024.
用于多标签F值的凸校准代理
Proc Mach Learn Res. 2020 Jul;119:11246-11255.
4
DeepMicrobes: taxonomic classification for metagenomics with deep learning.深度微生物:用于宏基因组学的深度学习分类法
NAR Genom Bioinform. 2020 Feb 19;2(1):lqaa009. doi: 10.1093/nargab/lqaa009. eCollection 2020 Mar.
5
From bag-of-genes to bag-of-genomes: metabolic modelling of communities in the era of metagenome-assembled genomes.从基因集到基因组集:宏基因组组装基因组时代群落的代谢建模
Comput Struct Biotechnol J. 2020 Jun 25;18:1722-1734. doi: 10.1016/j.csbj.2020.06.028. eCollection 2020.
6
Roadmap for naming uncultivated Archaea and Bacteria.未培养古菌和细菌命名路线图。
Nat Microbiol. 2020 Aug;5(8):987-994. doi: 10.1038/s41564-020-0733-x. Epub 2020 Jun 8.
7
A complete domain-to-species taxonomy for Bacteria and Archaea.细菌和古菌的完整域到种分类 taxonomy。
Nat Biotechnol. 2020 Sep;38(9):1079-1086. doi: 10.1038/s41587-020-0501-8. Epub 2020 Apr 27.
8
MLDSP-GUI: an alignment-free standalone tool with an interactive graphical user interface for DNA sequence comparison and analysis.MLDSP-GUI:一个无比对的独立工具,带有交互式图形用户界面,用于 DNA 序列比较和分析。
Bioinformatics. 2020 Apr 1;36(7):2258-2259. doi: 10.1093/bioinformatics/btz918.
9
Improved metagenomic analysis with Kraken 2.Kraken 2 提升宏基因组分析。
Genome Biol. 2019 Nov 28;20(1):257. doi: 10.1186/s13059-019-1891-0.
10
GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database.GTDB-Tk:一个使用基因组分类数据库对基因组进行分类的工具包。
Bioinformatics. 2019 Nov 15;36(6):1925-7. doi: 10.1093/bioinformatics/btz848.