文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

机器学习在水生宏基因组中检测病毒的前景与陷阱

The Promises and Pitfalls of Machine Learning for Detecting Viruses in Aquatic Metagenomes.

作者信息

Ponsero Alise J, Hurwitz Bonnie L

机构信息

Department of Biosystems Engineering, The University of Arizona, Tucson, AZ, United States.

BIO5 Institute, The University of Arizona, Tucson, AZ, United States.

出版信息

Front Microbiol. 2019 Apr 16;10:806. doi: 10.3389/fmicb.2019.00806. eCollection 2019.


DOI:10.3389/fmicb.2019.00806
PMID:31057513
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6477088/
Abstract

Tools allowing for the identification of viral sequences in host-associated and environmental metagenomes allows for a better understanding of the genetics and ecology of viruses and their hosts. Recently, new approaches using machine learning methods to distinguish viral from bacterial signal using k-mer sequence signatures were published for identifying viral contigs in metagenomes. The promise of these content-based approaches is the ability to discover new viruses, with no or few known relatives. In this perspective paper, we examine the use of the content-based machine learning tool VirFinder for the identification of viral sequences in aquatic metagenomes and explore the possibility of using ecosystem-focused models targeted to marine metagenomes. We discuss the impact of the training set composition on the tool performance and the current limitation for the retrieval of low abundance viral sequences in metagenomes. We identify potential biases that could arise from machine learning approaches for viral hunting in real-world datasets and suggest possible avenues to overcome them.

摘要

能够在宿主相关和环境宏基因组中识别病毒序列的工具,有助于更好地理解病毒及其宿主的遗传学和生态学。最近,发表了一些使用机器学习方法,通过k-mer序列特征区分病毒信号与细菌信号,来识别宏基因组中病毒重叠群的新方法。这些基于内容的方法的前景在于能够发现新病毒,这些新病毒几乎没有已知的亲缘关系。在这篇观点论文中,我们研究了基于内容的机器学习工具VirFinder在识别水生宏基因组中病毒序列方面的应用,并探讨了使用针对海洋宏基因组的以生态系统为重点的模型的可能性。我们讨论了训练集组成对工具性能的影响,以及当前在宏基因组中检索低丰度病毒序列的局限性。我们识别了在现实世界数据集中进行病毒搜寻的机器学习方法可能产生的潜在偏差,并提出了克服这些偏差的可能途径。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9b6f/6477088/a0e77d2f2ed7/fmicb-10-00806-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9b6f/6477088/39188effca8c/fmicb-10-00806-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9b6f/6477088/a0e77d2f2ed7/fmicb-10-00806-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9b6f/6477088/39188effca8c/fmicb-10-00806-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9b6f/6477088/a0e77d2f2ed7/fmicb-10-00806-g002.jpg

相似文献

[1]
The Promises and Pitfalls of Machine Learning for Detecting Viruses in Aquatic Metagenomes.

Front Microbiol. 2019-4-16

[2]
VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data.

Microbiome. 2017-7-6

[3]
VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences.

Microbiome. 2020-6-10

[4]
Reads Binning Improves the Assembly of Viral Genome Sequences From Metagenomic Samples.

Front Microbiol. 2021-5-21

[5]
Prediction of virus-host infectious association by supervised learning methods.

BMC Bioinformatics. 2017-3-14

[6]
Benchmarking informatics approaches for virus discovery: caution is needed when combining identification methods.

mSystems. 2024-3-19

[7]
Assessment of k-mer spectrum applicability for metagenomic dissimilarity analysis.

BMC Bioinformatics. 2016-1-16

[8]
Phage hunters: Computational strategies for finding phages in large-scale 'omics datasets.

Virus Res. 2017-11-1

[9]
RNN-VirSeeker: A Deep Learning Method for Identification of Short Viral Sequences From Metagenomes.

IEEE/ACM Trans Comput Biol Bioinform. 2022

[10]
Comparison of k-mer-based comparative metagenomic tools and approaches.

Microbiome Res Rep. 2023-7-20

引用本文的文献

[1]
Disentangling cobionts and contamination in long-read genomic data using sequence composition.

G3 (Bethesda). 2024-11-6

[2]
Hecatomb: an integrated software platform for viral metagenomics.

Gigascience. 2024-1-2

[3]
Benchmarking informatics approaches for virus discovery: caution is needed when combining identification methods.

mSystems. 2024-3-19

[4]
Cyanolichen microbiome contains novel viruses that encode genes to promote microbial metabolism.

ISME Commun. 2021-10-15

[5]
MArVD2: a machine learning enhanced tool to discriminate between archaeal and bacterial viruses in viral datasets.

ISME Commun. 2023-8-24

[6]
Functional biology and biotechnology of thermophilic viruses.

Essays Biochem. 2023-8-11

[7]
Gauge your phage: benchmarking of bacteriophage identification tools in metagenomic sequencing data.

Microbiome. 2023-4-21

[8]
Evaluation of computational phage detection tools for metagenomic datasets.

Front Microbiol. 2023-1-25

[9]
IMG/VR v4: an expanded database of uncultivated virus genomes within a framework of extensive functional, taxonomic, and ecological metadata.

Nucleic Acids Res. 2023-1-6

[10]
Computational Tools for the Analysis of Uncultivated Phage Genomes.

Microbiol Mol Biol Rev. 2022-6-15

本文引用的文献

[1]
IMG/VR v.2.0: an integrated data management and analysis system for cultivated and environmental viral genomes.

Nucleic Acids Res. 2019-1-8

[2]
Machine Learning for detection of viral sequences in human metagenomic datasets.

BMC Bioinformatics. 2018-9-24

[3]
MARVEL, a Tool for Prediction of Bacteriophage Sequences in Metagenomic Bins.

Front Genet. 2018-8-7

[4]
FastViromeExplorer: a pipeline for virus and phage identification and abundance profiling in metagenomics data.

PeerJ. 2018-1-12

[5]
Phage hunters: Computational strategies for finding phages in large-scale 'omics datasets.

Virus Res. 2017-11-1

[6]
Genome diversity of marine phages recovered from Mediterranean metagenomes: Size matters.

PLoS Genet. 2017-9-25

[7]
VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data.

Microbiome. 2017-7-6

[8]
Centrifuge: rapid and sensitive classification of metagenomic sequences.

Genome Res. 2016-12

[9]
MetaPhinder-Identifying Bacteriophage Sequences in Metagenomic Data Sets.

PLoS One. 2016-9-29

[10]
HoloVir: A Workflow for Investigating the Diversity and Function of Viruses in Invertebrate Holobionts.

Front Microbiol. 2016-6-9

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索