文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

VirFinder:一种新型的基于 k-mer 的工具,用于从组装的宏基因组数据中识别病毒序列。

VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data.

机构信息

Molecular and Computational Biology Program, University of Southern California, 1050 Childs Way, Los Angeles, CA, 90089, USA.

Department of Biological Sciences, University of Southern California, 3616 Trousdale Pkwy, Los Angeles, CA, 90089, USA.

出版信息

Microbiome. 2017 Jul 6;5(1):69. doi: 10.1186/s40168-017-0283-5.


DOI:10.1186/s40168-017-0283-5
PMID:28683828
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5501583/
Abstract

BACKGROUND: Identifying viral sequences in mixed metagenomes containing both viral and host contigs is a critical first step in analyzing the viral component of samples. Current tools for distinguishing prokaryotic virus and host contigs primarily use gene-based similarity approaches. Such approaches can significantly limit results especially for short contigs that have few predicted proteins or lack proteins with similarity to previously known viruses. METHODS: We have developed VirFinder, the first k-mer frequency based, machine learning method for virus contig identification that entirely avoids gene-based similarity searches. VirFinder instead identifies viral sequences based on our empirical observation that viruses and hosts have discernibly different k-mer signatures. VirFinder's performance in correctly identifying viral sequences was tested by training its machine learning model on sequences from host and viral genomes sequenced before 1 January 2014 and evaluating on sequences obtained after 1 January 2014. RESULTS: VirFinder had significantly better rates of identifying true viral contigs (true positive rates (TPRs)) than VirSorter, the current state-of-the-art gene-based virus classification tool, when evaluated with either contigs subsampled from complete genomes or assembled from a simulated human gut metagenome. For example, for contigs subsampled from complete genomes, VirFinder had 78-, 2.4-, and 1.8-fold higher TPRs than VirSorter for 1, 3, and 5 kb contigs, respectively, at the same false positive rates as VirSorter (0, 0.003, and 0.006, respectively), thus VirFinder works considerably better for small contigs than VirSorter. VirFinder furthermore identified several recently sequenced virus genomes (after 1 January 2014) that VirSorter did not and that have no nucleotide similarity to previously sequenced viruses, demonstrating VirFinder's potential advantage in identifying novel viral sequences. Application of VirFinder to a set of human gut metagenomes from healthy and liver cirrhosis patients reveals higher viral diversity in healthy individuals than cirrhosis patients. We also identified contig bins containing crAssphage-like contigs with higher abundance in healthy patients and a putative Veillonella genus prophage associated with cirrhosis patients. CONCLUSIONS: This innovative k-mer based tool complements gene-based approaches and will significantly improve prokaryotic viral sequence identification, especially for metagenomic-based studies of viral ecology.

摘要

背景:在包含病毒和宿主基因的混合宏基因组中识别病毒序列是分析样本病毒成分的关键第一步。当前用于区分原核病毒和宿主基因的工具主要使用基于基因相似性的方法。这种方法可能会显著限制结果,尤其是对于短基因而言,这些短基因的预测蛋白较少或缺乏与先前已知病毒相似的蛋白。

方法:我们开发了 VirFinder,这是第一个基于 k-mer 频率的、用于病毒基因识别的机器学习方法,它完全避免了基于基因相似性的搜索。VirFinder 基于我们的经验观察来识别病毒序列,即病毒和宿主具有明显不同的 k-mer 特征。通过在 2014 年 1 月 1 日之前测序的宿主和病毒基因组的序列上训练其机器学习模型,并在 2014 年 1 月 1 日之后获得的序列上进行评估,来测试 VirFinder 正确识别病毒序列的性能。

结果:与当前最先进的基于基因的病毒分类工具 VirSorter 相比,当使用从完整基因组中提取的或从模拟人类肠道宏基因组组装的基因进行评估时,VirFinder 能够更准确地识别真正的病毒基因(真阳性率 (TPR))。例如,对于从完整基因组中提取的基因,VirFinder 在 1、3 和 5 kb 基因的 TPR 分别比 VirSorter 高 78、2.4 和 1.8 倍,而假阳性率与 VirSorter 相同(分别为 0、0.003 和 0.006),因此,VirFinder 对小基因的效果明显优于 VirSorter。VirFinder 还鉴定了一些最近测序的病毒基因组(2014 年 1 月之后),而这些病毒基因组在 VirSorter 中无法识别,并且与之前测序的病毒没有核苷酸相似性,这表明 VirFinder 在鉴定新病毒序列方面具有潜在优势。将 VirFinder 应用于一组来自健康和肝硬化患者的人类肠道宏基因组中,发现健康个体的病毒多样性高于肝硬化患者。我们还鉴定了含有 crAssphage 样基因的基因库,这些基因在健康患者中的丰度更高,以及与肝硬化患者相关的假定韦荣球菌属噬菌体。

结论:这种创新的基于 k-mer 的工具补充了基于基因的方法,将极大地提高原核病毒序列的识别能力,尤其是在病毒生态的宏基因组学研究方面。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bcdd/5501583/f59de0923760/40168_2017_283_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bcdd/5501583/cdbf9632a3e4/40168_2017_283_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bcdd/5501583/4a43798b6953/40168_2017_283_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bcdd/5501583/2f0b2e3feeab/40168_2017_283_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bcdd/5501583/a48c5d3fbb5e/40168_2017_283_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bcdd/5501583/524042aabcf3/40168_2017_283_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bcdd/5501583/e292ae964dc8/40168_2017_283_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bcdd/5501583/f07e087bac06/40168_2017_283_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bcdd/5501583/f59de0923760/40168_2017_283_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bcdd/5501583/cdbf9632a3e4/40168_2017_283_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bcdd/5501583/4a43798b6953/40168_2017_283_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bcdd/5501583/2f0b2e3feeab/40168_2017_283_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bcdd/5501583/a48c5d3fbb5e/40168_2017_283_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bcdd/5501583/524042aabcf3/40168_2017_283_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bcdd/5501583/e292ae964dc8/40168_2017_283_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bcdd/5501583/f07e087bac06/40168_2017_283_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bcdd/5501583/f59de0923760/40168_2017_283_Fig8_HTML.jpg

相似文献

[1]
VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data.

Microbiome. 2017-7-6

[2]
VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences.

Microbiome. 2020-6-10

[3]
Mining, analyzing, and integrating viral signals from metagenomic data.

Microbiome. 2019-3-19

[4]
Assessment of k-mer spectrum applicability for metagenomic dissimilarity analysis.

BMC Bioinformatics. 2016-1-16

[5]
The Promises and Pitfalls of Machine Learning for Detecting Viruses in Aquatic Metagenomes.

Front Microbiol. 2019-4-16

[6]
Simulation study and comparative evaluation of viral contiguous sequence identification tools.

BMC Bioinformatics. 2021-6-16

[7]
Reads Binning Improves the Assembly of Viral Genome Sequences From Metagenomic Samples.

Front Microbiol. 2021-5-21

[8]
Optimizing and evaluating the reconstruction of Metagenome-assembled microbial genomes.

BMC Genomics. 2017-11-28

[9]
VirSorter: mining viral signal from microbial genomic data.

PeerJ. 2015-5-28

[10]
Unsupervised Binning of Metagenomic Assembled Contigs Using Improved Fuzzy C-Means Method.

IEEE/ACM Trans Comput Biol Bioinform. 2016-6-7

引用本文的文献

[1]
Phage quest: a beginner's guide to explore viral diversity in the prokaryotic world.

Brief Bioinform. 2025-8-31

[2]
A prevalent huge phage clade in human and animal gut microbiomes.

Res Sq. 2025-8-19

[3]
RNA-viromics unveils diverse RNA viral communities in Large-billed crows and Northern Ravens.

Virus Genes. 2025-8-23

[4]
A prevalent huge phage clade in human and animal gut microbiomes.

bioRxiv. 2025-8-11

[5]
Genomic insights into bacteriophages: a new frontier in AMR detection and phage therapy.

Brief Funct Genomics. 2025-1-15

[6]
Integrating metagenomics and cultivation unveils oral phage diversity and potential impact on hosts.

NPJ Biofilms Microbiomes. 2025-7-26

[7]
SegFinder: an automated tool for identifying complete RNA virus genome segments through co-occurrence in multiple sequenced samples.

Brief Bioinform. 2025-7-2

[8]
Landscape of mobile genetic elements and their functional cargo across the gastrointestinal tract microbiomes in ruminants.

Microbiome. 2025-7-12

[9]
Moderate altitude exposure impacts extensive host-microbiota multi-kingdom connectivity with serum metabolome and fasting blood glucose.

Virulence. 2025-12

[10]
Metagenomics-based novel Caulimoviridae virus discovery and its development of identification markers in Lilium lancifolium thunb.

Virol J. 2025-7-5

本文引用的文献

[1]
metaSPAdes: a new versatile metagenomic assembler.

Genome Res. 2017-5

[2]
k-SLAM: accurate and ultra-fast taxonomic classification and gene identification for large metagenomic data sets.

Nucleic Acids Res. 2017-2-28

[3]
Alignment-free $d_2^*$ oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences.

Nucleic Acids Res. 2017-1-9

[4]
Alignment-free Transcriptomic and Metatranscriptomic Comparison Using Sequencing Signatures with Variable Length Markov Chains.

Sci Rep. 2016-11-23

[5]
Centrifuge: rapid and sensitive classification of metagenomic sequences.

Genome Res. 2016-12

[6]
GUTSS: An Alignment-Free Sequence Comparison Method for Use in Human Intestinal Microbiome and Fecal Microbiota Transplantation Analysis.

PLoS One. 2016-7-8

[7]
COCACOLA: binning metagenomic contigs using sequence COmposition, read CoverAge, CO-alignment and paired-end read LinkAge.

Bioinformatics. 2017-3-15

[8]
PHASTER: a better, faster version of the PHAST phage search tool.

Nucleic Acids Res. 2016-7-8

[9]
Computational prospecting the great viral unknown.

FEMS Microbiol Lett. 2016-5

[10]
ViromeScan: a new tool for metagenomic viral community profiling.

BMC Genomics. 2016-3-1

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索