• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

病毒连续序列识别工具的模拟研究与比较评估

Simulation study and comparative evaluation of viral contiguous sequence identification tools.

作者信息

Glickman Cody, Hendrix Jo, Strong Michael

机构信息

Center for Genes, Environment, and Health, National Jewish Health, 1400 Jackson Street, Denver, CO, 80206, USA.

Computational Bioscience, University of Colorado Anschutz, 12801 E 17th Avenue, Aurora, CO, 80045, USA.

出版信息

BMC Bioinformatics. 2021 Jun 16;22(1):329. doi: 10.1186/s12859-021-04242-0.

DOI:10.1186/s12859-021-04242-0
PMID:34130621
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8207588/
Abstract

BACKGROUND

Viruses, including bacteriophages, are important components of environmental and human associated microbial communities. Viruses can act as extracellular reservoirs of bacterial genes, can mediate microbiome dynamics, and can influence the virulence of clinical pathogens. Various targeted metagenomic analysis techniques detect viral sequences, but these methods often exclude large and genome integrated viruses. In this study, we evaluate and compare the ability of nine state-of-the-art bioinformatic tools, including Vibrant, VirSorter, VirSorter2, VirFinder, DeepVirFinder, MetaPhinder, Kraken 2, Phybrid, and a BLAST search using identified proteins from the Earth Virome Pipeline to identify viral contiguous sequences (contigs) across simulated metagenomes with different read distributions, taxonomic compositions, and complexities.

RESULTS

Of the tools tested in this study, VirSorter achieved the best F1 score while Vibrant had the highest average F1 score at predicting integrated prophages. Though less balanced in its precision and recall, Kraken2 had the highest average precision by a substantial margin. We introduced the machine learning tool, Phybrid, which demonstrated an improvement in average F1 score over tools such as MetaPhinder. The tool utilizes machine learning with both gene content and nucleotide features. The addition of nucleotide features improves the precision and recall compared to the gene content features alone.Viral identification by all tools was not impacted by underlying read distribution but did improve with contig length. Tool performance was inversely related to taxonomic complexity and varied by the phage host. For instance, Rhizobium and Enterococcus phages were identified consistently by the tools; whereas, Neisseria prophage sequences were commonly missed in this study.

CONCLUSION

This study benchmarked the performance of nine state-of-the-art bioinformatic tools to identify viral contigs across different simulation conditions. This study explored the ability of the tools to identify integrated prophage elements traditionally excluded from targeted sequencing approaches. Our comprehensive analysis of viral identification tools to assess their performance in a variety of situations provides valuable insights to viral researchers looking to mine viral elements from publicly available metagenomic data.

摘要

背景

病毒,包括噬菌体,是环境微生物群落和人类相关微生物群落的重要组成部分。病毒可作为细菌基因的胞外储存库,介导微生物群落动态变化,并可影响临床病原体的毒力。各种靶向宏基因组分析技术可检测病毒序列,但这些方法通常会排除大型病毒和基因组整合病毒。在本研究中,我们评估并比较了九种最先进的生物信息学工具的能力,这些工具包括Vibrant、VirSorter、VirSorter2、VirFinder、DeepVirFinder、MetaPhinder、Kraken 2、Phybrid,以及使用来自地球病毒组管道中已鉴定蛋白质的BLAST搜索,以识别跨越具有不同读段分布、分类组成和复杂度的模拟宏基因组的病毒连续序列(重叠群)。

结果

在本研究测试的工具中,VirSorter在预测整合原噬菌体时获得了最佳F1分数,而Vibrant在预测整合原噬菌体方面具有最高的平均F1分数。尽管Kraken2的精确率和召回率不太平衡,但其平均精确率却大幅领先。我们引入了机器学习工具Phybrid,它在平均F1分数上比MetaPhinder等工具有所提高。该工具利用机器学习结合基因内容和核苷酸特征。与仅使用基因内容特征相比,添加核苷酸特征提高了精确率和召回率。所有工具的病毒鉴定均不受潜在读段分布的影响,但会随着重叠群长度的增加而提高。工具性能与分类复杂度呈负相关,且因噬菌体宿主而异。例如,工具能够一致地鉴定出根瘤菌噬菌体和肠球菌噬菌体;而在本研究中,淋病奈瑟菌原噬菌体序列常常被遗漏。

结论

本研究对九种最先进的生物信息学工具在不同模拟条件下识别病毒重叠群的性能进行了基准测试。本研究探索了这些工具识别传统上被靶向测序方法排除的整合原噬菌体元件的能力。我们对病毒鉴定工具在各种情况下的性能进行的全面分析,为希望从公开可用的宏基因组数据中挖掘病毒元件的病毒研究人员提供了有价值的见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fba9/8207588/2e29574ea7c3/12859_2021_4242_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fba9/8207588/510fbdaed78c/12859_2021_4242_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fba9/8207588/e131c1efc8d5/12859_2021_4242_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fba9/8207588/eab80d8661c2/12859_2021_4242_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fba9/8207588/401ff0900ed9/12859_2021_4242_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fba9/8207588/2e29574ea7c3/12859_2021_4242_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fba9/8207588/510fbdaed78c/12859_2021_4242_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fba9/8207588/e131c1efc8d5/12859_2021_4242_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fba9/8207588/eab80d8661c2/12859_2021_4242_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fba9/8207588/401ff0900ed9/12859_2021_4242_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fba9/8207588/2e29574ea7c3/12859_2021_4242_Fig5_HTML.jpg

相似文献

1
Simulation study and comparative evaluation of viral contiguous sequence identification tools.病毒连续序列识别工具的模拟研究与比较评估
BMC Bioinformatics. 2021 Jun 16;22(1):329. doi: 10.1186/s12859-021-04242-0.
2
VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences.VIBRANT:从基因组序列中自动恢复、注释和培养微生物病毒,并评估病毒群落功能。
Microbiome. 2020 Jun 10;8(1):90. doi: 10.1186/s40168-020-00867-0.
3
Gauge your phage: benchmarking of bacteriophage identification tools in metagenomic sequencing data.评估噬菌体:宏基因组测序数据中噬菌体鉴定工具的基准测试。
Microbiome. 2023 Apr 21;11(1):84. doi: 10.1186/s40168-023-01533-x.
4
VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data.VirFinder:一种新型的基于 k-mer 的工具,用于从组装的宏基因组数据中识别病毒序列。
Microbiome. 2017 Jul 6;5(1):69. doi: 10.1186/s40168-017-0283-5.
5
Benchmarking informatics approaches for virus discovery: caution is needed when combining identification methods.病毒发现的信息学方法的基准测试:在组合鉴定方法时需要谨慎。
mSystems. 2024 Mar 19;9(3):e0110523. doi: 10.1128/msystems.01105-23. Epub 2024 Feb 20.
6
Microbial Diversity and Phage-Host Interactions in the Georgian Coastal Area of the Black Sea Revealed by Whole Genome Metagenomic Sequencing.通过全基因组宏基因组测序揭示黑海格鲁吉亚沿海地区的微生物多样性和噬菌体-宿主相互作用。
Mar Drugs. 2020 Nov 14;18(11):558. doi: 10.3390/md18110558.
7
Exploring the Complexity of the Human Respiratory Virome through an In Silico Analysis of Shotgun Metagenomic Data Retrieved from Public Repositories.通过对公共存储库中获取的鸟枪法宏基因组数据进行计算分析,探索人类呼吸道病毒组的复杂性。
Viruses. 2024 Jun 13;16(6):953. doi: 10.3390/v16060953.
8
VirSorter: mining viral signal from microbial genomic data.VirSorter:从微生物基因组数据中挖掘病毒信号。
PeerJ. 2015 May 28;3:e985. doi: 10.7717/peerj.985. eCollection 2015.
9
Mining, analyzing, and integrating viral signals from metagenomic data.从宏基因组数据中挖掘、分析和整合病毒信号。
Microbiome. 2019 Mar 19;7(1):42. doi: 10.1186/s40168-019-0657-y.
10
Evaluation of computational phage detection tools for metagenomic datasets.用于宏基因组数据集的计算噬菌体检测工具评估
Front Microbiol. 2023 Jan 25;14:1078760. doi: 10.3389/fmicb.2023.1078760. eCollection 2023.

引用本文的文献

1
Phages-bacteria interactions underlying the dynamics of polyhydroxyalkanoate-producing mixed microbial cultures via meta-omics study.通过宏组学研究揭示聚羟基脂肪酸酯产生混合微生物培养物动态背后的噬菌体 - 细菌相互作用
mSystems. 2025 Apr 22;10(4):e0020025. doi: 10.1128/msystems.00200-25. Epub 2025 Mar 28.
2
Over two decades of research on the marine RNA virosphere.二十多年来对海洋RNA病毒圈的研究。
Imeta. 2022 Oct 17;1(4):e59. doi: 10.1002/imt2.59. eCollection 2022 Dec.
3
Benchmarking bioinformatic virus identification tools using real-world metagenomic data across biomes.

本文引用的文献

1
Respiratory eukaryotic virome expansion and bacteriophage deficiency characterize childhood asthma.呼吸道真核病毒组扩张和噬菌体缺乏是儿童哮喘的特征。
Sci Rep. 2023 May 23;13(1):8319. doi: 10.1038/s41598-023-34730-7.
2
Identifying viruses from metagenomic data using deep learning.利用深度学习从宏基因组数据中识别病毒。
Quant Biol. 2020 Mar;8(1):64-77. doi: 10.1007/s40484-019-0187-4.
3
VirSorter2: a multi-classifier, expert-guided approach to detect diverse DNA and RNA viruses.VirSorter2:一种用于检测多种DNA和RNA病毒的多分类器、专家指导方法。
利用跨生物群落的真实世界宏基因组数据对生物信息病毒识别工具进行基准测试。
Genome Biol. 2024 Apr 15;25(1):97. doi: 10.1186/s13059-024-03236-4.
4
Benchmarking informatics approaches for virus discovery: caution is needed when combining identification methods.病毒发现的信息学方法的基准测试:在组合鉴定方法时需要谨慎。
mSystems. 2024 Mar 19;9(3):e0110523. doi: 10.1128/msystems.01105-23. Epub 2024 Feb 20.
5
Large language models improve annotation of prokaryotic viral proteins.大语言模型提高原核病毒蛋白的注释效果。
Nat Microbiol. 2024 Feb;9(2):537-549. doi: 10.1038/s41564-023-01584-8. Epub 2024 Jan 29.
6
Correlation between the gut microbiome and neurodegenerative diseases: a review of metagenomics evidence.肠道微生物群与神经退行性疾病之间的关联:宏基因组学证据综述
Neural Regen Res. 2024 Apr;19(4):833-845. doi: 10.4103/1673-5374.382223.
7
Large language models improve annotation of viral proteins.大型语言模型改善病毒蛋白的注释。
Res Sq. 2023 May 2:rs.3.rs-2852098. doi: 10.21203/rs.3.rs-2852098/v1.
8
Current trends in RNA virus detection through metatranscriptome sequencing data.通过宏转录组测序数据检测 RNA 病毒的当前趋势。
FEBS Open Bio. 2023 Jun;13(6):992-1000. doi: 10.1002/2211-5463.13626. Epub 2023 May 20.
9
Metaviromic analyses of DNA virus community from sediments of the N-Choe stream, North India.印度北部 N-Choe 溪流沉积物中 DNA 病毒群落的代谢组学分析。
Virus Res. 2023 Jun;330:199110. doi: 10.1016/j.virusres.2023.199110. Epub 2023 Apr 11.
10
ViroProfiler: a containerized bioinformatics pipeline for viral metagenomic data analysis.ViroProfiler:用于病毒宏基因组数据分析的集装箱化生物信息学管道。
Gut Microbes. 2023 Jan-Dec;15(1):2192522. doi: 10.1080/19490976.2023.2192522.
Microbiome. 2021 Feb 1;9(1):37. doi: 10.1186/s40168-020-00990-y.
4
VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences.VIBRANT:从基因组序列中自动恢复、注释和培养微生物病毒,并评估病毒群落功能。
Microbiome. 2020 Jun 10;8(1):90. doi: 10.1186/s40168-020-00867-0.
5
Meningococcal Disease-Associated Prophage-Like Elements Are Present in Neisseria gonorrhoeae and Some Commensal Neisseria Species.淋病奈瑟菌和一些共生奈瑟菌物种中存在脑膜炎奈瑟菌相关类噬菌体元件。
Genome Biol Evol. 2020 Feb 1;12(2):3938-3950. doi: 10.1093/gbe/evaa023.
6
Improved metagenomic analysis with Kraken 2.Kraken 2 提升宏基因组分析。
Genome Biol. 2019 Nov 28;20(1):257. doi: 10.1186/s13059-019-1891-0.
7
The Promises and Pitfalls of Machine Learning for Detecting Viruses in Aquatic Metagenomes.机器学习在水生宏基因组中检测病毒的前景与陷阱
Front Microbiol. 2019 Apr 16;10:806. doi: 10.3389/fmicb.2019.00806. eCollection 2019.
8
virMine: automated detection of viral sequences from complex metagenomic samples.virMine:从复杂宏基因组样本中自动检测病毒序列。
PeerJ. 2019 Apr 10;7:e6695. doi: 10.7717/peerj.6695. eCollection 2019.
9
Expansion of Bacteriophages Is Linked to Aggravated Intestinal Inflammation and Colitis.噬菌体的扩张与肠道炎症和结肠炎的加重有关。
Cell Host Microbe. 2019 Feb 13;25(2):285-299.e8. doi: 10.1016/j.chom.2019.01.008.
10
CAMISIM: simulating metagenomes and microbial communities.CAMISIM:模拟宏基因组和微生物群落。
Microbiome. 2019 Feb 8;7(1):17. doi: 10.1186/s40168-019-0633-6.