Suppr超能文献

3CAC:利用组装图提高宏基因组组装中噬菌体和质粒的分类。

3CAC: improving the classification of phages and plasmids in metagenomic assemblies using assembly graphs.

机构信息

The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, 69978, Israel.

出版信息

Bioinformatics. 2022 Sep 16;38(Suppl_2):ii56-ii61. doi: 10.1093/bioinformatics/btac468.

Abstract

MOTIVATION

Bacteriophages and plasmids usually coexist with their host bacteria in microbial communities and play important roles in microbial evolution. Accurately identifying sequence contigs as phages, plasmids and bacterial chromosomes in mixed metagenomic assemblies is critical for further unraveling their functions. Many classification tools have been developed for identifying either phages or plasmids in metagenomic assemblies. However, only two classifiers, PPR-Meta and viralVerify, were proposed to simultaneously identify phages and plasmids in mixed metagenomic assemblies. Due to the very high fraction of chromosome contigs in the assemblies, both tools achieve high precision in the classification of chromosomes but perform poorly in classifying phages and plasmids. Short contigs in these assemblies are often wrongly classified or classified as uncertain.

RESULTS

Here we present 3CAC, a new three-class classifier that improves the precision of phage and plasmid classification. 3CAC starts with an initial three-class classification generated by existing classifiers and improves the classification of short contigs and contigs with low confidence classification by using proximity in the assembly graph. Evaluation on simulated metagenomes and on real human gut microbiome samples showed that 3CAC outperformed PPR-Meta and viralVerify in both precision and recall, and increased F1-score by 10-60 percentage points.

AVAILABILITY AND IMPLEMENTATION

The 3CAC software is available on https://github.com/Shamir-Lab/3CAC.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

噬菌体和质粒通常与它们的宿主细菌在微生物群落中共存,并在微生物进化中发挥重要作用。在混合宏基因组组装中准确识别作为噬菌体、质粒和细菌染色体的序列片段对于进一步揭示它们的功能至关重要。已经开发了许多分类工具来识别宏基因组组装中的噬菌体或质粒。然而,只有两个分类器,PPR-Meta 和 viralVerify,被提议用于同时识别混合宏基因组组装中的噬菌体和质粒。由于组装中染色体片段的比例非常高,这两个工具在染色体分类方面都具有很高的精度,但在噬菌体和质粒分类方面表现不佳。这些组装中的短片段通常被错误分类或分类为不确定。

结果

我们在这里提出了 3CAC,这是一种新的三分类分类器,可提高噬菌体和质粒分类的精度。3CAC 首先使用现有的分类器生成初始的三分类分类,然后通过在组装图中的接近度来改进短片段和置信度低的片段的分类。在模拟的宏基因组和真实的人类肠道微生物组样本上的评估表明,3CAC 在精度和召回率方面均优于 PPR-Meta 和 viralVerify,并且 F1 分数提高了 10-60 个百分点。

可用性和实现

3CAC 软件可在 https://github.com/Shamir-Lab/3CAC 上获得。

补充信息

补充数据可在生物信息学在线获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验