Suppr超能文献

MT-MAG:用于宏基因组组装基因组的完整或部分分类学分配的准确且可解释的机器学习。

MT-MAG: Accurate and interpretable machine learning for complete or partial taxonomic assignments of metagenomeassembled genomes.

机构信息

School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada.

Department of Biology, University of Waterloo, Waterloo, Ontario, Canada.

出版信息

PLoS One. 2023 Aug 18;18(8):e0283536. doi: 10.1371/journal.pone.0283536. eCollection 2023.

Abstract

We propose MT-MAG, a novel machine learning-based software tool for the complete or partial hierarchically-structured taxonomic classification of metagenome-assembled genomes (MAGs). MT-MAG is alignment-free, with k-mer frequencies being the only feature used to distinguish a DNA sequence from another (herein k = 7). MT-MAG is capable of classifying large and diverse metagenomic datasets: a total of 245.68 Gbp in the training sets, and 9.6 Gbp in the test sets analyzed in this study. In addition to complete classifications, MT-MAG offers a "partial classification" option, whereby a classification at a higher taxonomic level is provided for MAGs that cannot be classified to the Species level. MT-MAG outputs complete or partial classification paths, and interpretable numerical classification confidences of its classifications, at all taxonomic ranks. To assess the performance of MT-MAG, we define a "weighted classification accuracy," with a weighting scheme reflecting the fact that partial classifications at different ranks are not equally informative. For the two benchmarking datasets analyzed (genomes from human gut microbiome species, and bacterial and archaeal genomes assembled from cow rumen metagenomic sequences), MT-MAG achieves an average of 87.32% in weighted classification accuracy. At the Species level, MT-MAG outperforms DeepMicrobes, the only other comparable software tool, by an average of 34.79% in weighted classification accuracy. In addition, MT-MAG is able to completely classify an average of 67.70% of the sequences at the Species level, compared with DeepMicrobes which only classifies 47.45%. Moreover, MT-MAG provides additional information for sequences that it could not classify at the Species level, resulting in the partial or complete classification of 95.13%, of the genomes in the datasets analyzed. Lastly, unlike other taxonomic assignment tools (e.g., GDTB-Tk), MT-MAG is an alignment-free and genetic marker-free tool, able to provide additional bioinformatics analysis to confirm existing or tentative taxonomic assignments.

摘要

我们提出了 MT-MAG,这是一种基于机器学习的新型软件工具,用于对宏基因组组装基因组(MAG)进行完全或部分层次结构分类。MT-MAG 是无比对的,仅使用 k-mer 频率作为区分 DNA 序列的特征(这里 k=7)。MT-MAG 能够对大型和多样化的宏基因组数据集进行分类:在本研究中分析的训练集中共有 245.68 Gbp,测试集中有 9.6 Gbp。除了完整的分类外,MT-MAG 还提供了“部分分类”选项,对于无法分类到种级别的 MAG,可以提供更高分类水平的分类。MT-MAG 输出完整或部分分类路径,以及在所有分类级别下可解释的分类置信度。为了评估 MT-MAG 的性能,我们定义了“加权分类准确率”,其中加权方案反映了不同等级的部分分类的信息量并不相等。对于分析的两个基准数据集(来自人类肠道微生物物种的基因组,以及从牛瘤胃宏基因组序列组装的细菌和古菌基因组),MT-MAG 在加权分类准确率方面平均达到 87.32%。在种级水平上,MT-MAG 的加权分类准确率平均比唯一可比的软件工具 DeepMicrobes 高出 34.79%。此外,MT-MAG 能够完全分类种级水平上平均 67.70%的序列,而 DeepMicrobes 只能分类 47.45%。此外,MT-MAG 为它无法在种级水平分类的序列提供了额外的信息,导致分析的数据集的基因组中有 95.13%得到了部分或完整的分类。最后,与其他分类分配工具(例如 GDTB-Tk)不同,MT-MAG 是一种无比对和无遗传标记的工具,能够提供额外的生物信息学分析来确认现有的或暂定的分类分配。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/26fc/10437822/124916981c0a/pone.0283536.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验