文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

病毒发现的信息学方法的基准测试:在组合鉴定方法时需要谨慎。

Benchmarking informatics approaches for virus discovery: caution is needed when combining identification methods.

机构信息

Department of Civil and Environmental Engineering, Case Western Reserve University, Cleveland, Ohio, USA.

Department of Microbiology, The Ohio State University, Columbus, Ohio, USA.

出版信息

mSystems. 2024 Mar 19;9(3):e0110523. doi: 10.1128/msystems.01105-23. Epub 2024 Feb 20.


DOI:10.1128/msystems.01105-23
PMID:38376167
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10949488/
Abstract

UNLABELLED: Understanding the ecological impacts of viruses on natural and engineered ecosystems relies on the accurate identification of viral sequences from community sequencing data. To maximize viral recovery from metagenomes, researchers frequently combine viral identification tools. However, the effectiveness of this strategy is unknown. Here, we benchmarked combinations of six widely used informatics tools for viral identification and analysis (VirSorter, VirSorter2, VIBRANT, DeepVirFinder, CheckV, and Kaiju), called "rulesets." Rulesets were tested against mock metagenomes composed of taxonomically diverse sequence types and diverse aquatic metagenomes to assess the effects of the degree of viral enrichment and habitat on tool performance. We found that six rulesets achieved equivalent accuracy [Matthews Correlation Coefficient (MCC) = 0.77, ≥ 0.05]. Each contained VirSorter2, and five used our "tuning removal" rule designed to remove non-viral contamination. While DeepVirFinder, VIBRANT, and VirSorter were each found once in these high-accuracy rulesets, they were not found in combination with each other: combining tools does not lead to optimal performance. Our validation suggests that the MCC plateau at 0.77 is partly caused by inaccurate labeling within reference sequence databases. In aquatic metagenomes, our highest MCC ruleset identified more viral sequences in virus-enriched (44%-46%) than in cellular metagenomes (7%-19%). While improved algorithms may lead to more accurate viral identification tools, this should be done in tandem with careful curation of sequence databases. We recommend using the VirSorter2 ruleset and our empirically derived tuning removal rule. Our analysis provides insight into methods for viral identification and will enable more robust viral identification from metagenomic data sets. IMPORTANCE: The identification of viruses from environmental metagenomes using informatics tools has offered critical insights in microbial ecology. However, it remains difficult for researchers to know which tools optimize viral recovery for their specific study. In an attempt to recover more viruses, studies are increasingly combining the outputs from multiple tools without validating this approach. After benchmarking combinations of six viral identification tools against mock metagenomes and environmental samples, we found that these tools should only be combined cautiously. Two to four tool combinations maximized viral recovery and minimized non-viral contamination compared with either the single-tool or the five- to six-tool ones. By providing a rigorous overview of the behavior of viral identification strategies and a pipeline to replicate our process, our findings guide the use of existing viral identification tools and offer a blueprint for feature engineering of new tools that will lead to higher-confidence viral discovery in microbiome studies.

摘要

未加标签:理解病毒对自然和工程生态系统的生态影响,依赖于从群落测序数据中准确识别病毒序列。为了从宏基因组中最大限度地回收病毒,研究人员经常结合使用病毒识别工具。然而,这种策略的有效性尚不清楚。在这里,我们对六种广泛使用的病毒识别和分析信息学工具(VirSorter、VirSorter2、VIBRANT、DeepVirFinder、CheckV 和 Kaiju)的组合(称为“规则集”)进行了基准测试。规则集针对由分类多样的序列类型和多样的水生宏基因组组成的模拟宏基因组进行了测试,以评估病毒富集程度和生境对工具性能的影响。我们发现,六个规则集的准确性相当[马修斯相关系数(MCC)=0.77,≥0.05]。每个规则集都包含 VirSorter2,并且五个规则集都使用了我们设计的“调整去除”规则,用于去除非病毒污染。虽然 DeepVirFinder、VIBRANT 和 VirSorter 都在这些高精度规则集中出现过,但它们彼此之间并未组合出现:组合工具并不能带来最佳性能。我们的验证表明,MCC 在 0.77 处达到平台期,部分原因是参考序列数据库中的标签不准确。在水生宏基因组中,我们的最高 MCC 规则集在病毒富集(44%-46%)的样本中比在细胞宏基因组(7%-19%)中识别出更多的病毒序列。虽然改进的算法可能会导致更准确的病毒识别工具,但这应该与序列数据库的仔细管理同时进行。我们建议使用 VirSorter2 规则集和我们经验性推导的调整去除规则。我们的分析提供了对病毒识别方法的深入了解,并将使从宏基因组数据集进行更稳健的病毒识别成为可能。

重要性:使用信息学工具从环境宏基因组中鉴定病毒为微生物生态学提供了关键的见解。然而,研究人员仍然难以确定哪些工具最适合他们特定的研究来优化病毒回收。为了尽可能多地回收病毒,研究人员越来越多地结合使用多种工具的输出,而没有验证这种方法。在对六种病毒识别工具的组合与模拟宏基因组和环境样本进行基准测试后,我们发现这些工具的组合应该谨慎进行。与单个工具或五到六个工具相比,两到四个工具的组合最大限度地提高了病毒的回收,同时最大限度地减少了非病毒污染。通过对病毒识别策略的行为进行严格概述,并提供一个可复制我们流程的管道,我们的研究结果为现有病毒识别工具的使用提供了指导,并为新工具的特征工程提供了蓝图,这将有助于在微生物组研究中更有信心地发现病毒。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b25/10949488/ee477dcf102b/msystems.01105-23.f008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b25/10949488/70fc37d93b89/msystems.01105-23.f001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b25/10949488/a5d0877bb007/msystems.01105-23.f002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b25/10949488/bb6bab7546be/msystems.01105-23.f003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b25/10949488/9d9c0f312f58/msystems.01105-23.f004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b25/10949488/ab637754f6b6/msystems.01105-23.f005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b25/10949488/d4e6e55d2a16/msystems.01105-23.f006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b25/10949488/d03da6ac6bd5/msystems.01105-23.f007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b25/10949488/ee477dcf102b/msystems.01105-23.f008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b25/10949488/70fc37d93b89/msystems.01105-23.f001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b25/10949488/a5d0877bb007/msystems.01105-23.f002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b25/10949488/bb6bab7546be/msystems.01105-23.f003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b25/10949488/9d9c0f312f58/msystems.01105-23.f004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b25/10949488/ab637754f6b6/msystems.01105-23.f005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b25/10949488/d4e6e55d2a16/msystems.01105-23.f006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b25/10949488/d03da6ac6bd5/msystems.01105-23.f007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b25/10949488/ee477dcf102b/msystems.01105-23.f008.jpg

相似文献

[1]
Benchmarking informatics approaches for virus discovery: caution is needed when combining identification methods.

mSystems. 2024-3-19

[2]
Simulation study and comparative evaluation of viral contiguous sequence identification tools.

BMC Bioinformatics. 2021-6-16

[3]
VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences.

Microbiome. 2020-6-10

[4]
MVP: a modular viromics pipeline to identify, filter, cluster, annotate, and bin viruses from metagenomes.

mSystems. 2024-10-22

[5]
VirSorter: mining viral signal from microbial genomic data.

PeerJ. 2015-5-28

[6]
Gauge your phage: benchmarking of bacteriophage identification tools in metagenomic sequencing data.

Microbiome. 2023-4-21

[7]
Benchmarking bioinformatic virus identification tools using real-world metagenomic data across biomes.

Genome Biol. 2024-4-15

[8]
VirSorter2: a multi-classifier, expert-guided approach to detect diverse DNA and RNA viruses.

Microbiome. 2021-2-1

[9]
TaxiBGC: a Taxonomy-Guided Approach for Profiling Experimentally Characterized Microbial Biosynthetic Gene Clusters and Secondary Metabolite Production Potential in Metagenomes.

mSystems. 2022-12-20

[10]
Viromes vs. mixed community metagenomes: choice of method dictates interpretation of viral community ecology.

Microbiome. 2024-10-7

引用本文的文献

[1]
Phage quest: a beginner's guide to explore viral diversity in the prokaryotic world.

Brief Bioinform. 2025-8-31

[2]
Viromics approaches for the study of viral diversity and ecology in microbiomes.

Nat Rev Genet. 2025-7-21

[3]
VirNucPro: an identifier for the identification of viral short sequences using six-frame translation and large language models.

Brief Bioinform. 2025-5-1

[4]
Development of a quantitative metagenomic approach to establish quantitative limits and its application to viruses.

Nucleic Acids Res. 2025-2-27

[5]
VirID: Beyond Virus Discovery-An Integrated Platform for Comprehensive RNA Virus Characterization.

Mol Biol Evol. 2024-10-4

[6]
Prokaryotic-virus-encoded auxiliary metabolic genes throughout the global oceans.

Microbiome. 2024-8-29

[7]
A panoramic view of the virosphere in three wastewater treatment plants by integrating viral-like particle-concentrated and traditional non-concentrated metagenomic approaches.

Imeta. 2024-3-29

本文引用的文献

[1]
ViWrap: A modular pipeline to identify, bin, classify, and predict viral-host relationships for viruses from metagenomes.

Imeta. 2023-8

[2]
MuDoGeR: Multi-Domain Genome recovery from metagenomes made easy.

Mol Ecol Resour. 2024-2

[3]
Identification of mobile genetic elements with geNomad.

Nat Biotechnol. 2024-8

[4]
Gauge your phage: benchmarking of bacteriophage identification tools in metagenomic sequencing data.

Microbiome. 2023-4-21

[5]
ViroProfiler: a containerized bioinformatics pipeline for viral metagenomic data analysis.

Gut Microbes. 2023

[6]
Identifying Eukaryotes and Factors Influencing Their Biogeography in Drinking Water Metagenomes.

Environ Sci Technol. 2023-3-7

[7]
Insights into the global freshwater virome.

Front Microbiol. 2022-9-28

[8]
MetaPhage: an Automated Pipeline for Analyzing, Annotating, and Classifying Bacteriophages in Metagenomics Sequencing Data.

mSystems. 2022-10-26

[9]
A snapshot of the global drinking water virome: Diversity and metabolic potential vary with residual disinfectant use.

Water Res. 2022-6-30

[10]
Computational Tools for the Analysis of Uncultivated Phage Genomes.

Microbiol Mol Biol Rev. 2022-6-15

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索