• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Virtifier:一种基于深度学习的宏基因组病毒序列标识符。

Virtifier: a deep learning-based identifier for viral sequences from metagenomes.

机构信息

College of Communication Engineering, Jilin University, Changchun 130022, China.

出版信息

Bioinformatics. 2022 Feb 7;38(5):1216-1222. doi: 10.1093/bioinformatics/btab845.

DOI:10.1093/bioinformatics/btab845
PMID:34908121
Abstract

MOTIVATION

Viruses, the most abundant biological entities on earth, are important components of microbial communities, and as major human pathogens, they are responsible for human mortality and morbidity. The identification of viral sequences from metagenomes is critical for viral analysis. As massive quantities of short sequences are generated by next-generation sequencing, most methods utilize discrete and sparse one-hot vectors to encode nucleotide sequences, which are usually ineffective in viral identification.

RESULTS

In this article, Virtifier, a deep learning-based viral identifier for sequences from metagenomic data is proposed. It includes a meaningful nucleotide sequence encoding method named Seq2Vec and a variant viral sequence predictor with an attention-based long short-term memory (LSTM) network. By utilizing a fully trained embedding matrix to encode codons, Seq2Vec can efficiently extract the relationships among those codons in a nucleotide sequence. Combined with an attention layer, the LSTM neural network can further analyze the codon relationships and sift the parts that contribute to the final features. Experimental results of three datasets have shown that Virtifier can accurately identify short viral sequences (<500 bp) from metagenomes, surpassing three widely used methods, VirFinder, DeepVirFinder and PPR-Meta. Meanwhile, a comparable performance was achieved by Virtifier at longer lengths (>5000 bp).

AVAILABILITY AND IMPLEMENTATION

A Python implementation of Virtifier and the Python code developed for this study have been provided on Github https://github.com/crazyinter/Seq2Vec. The RefSeq genomes in this article are available in VirFinder at https://dx.doi.org/10.1186/s40168-017-0283-5. The CAMI Challenge Dataset 3 CAMI_high dataset in this article is available in CAMI at https://data.cami-challenge.org/participate. The real human gut metagenomes in this article are available at https://dx.doi.org/10.1101/gr.142315.112.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

病毒是地球上最丰富的生物实体,是微生物群落的重要组成部分,作为主要的人类病原体,它们是导致人类死亡和发病的原因。从宏基因组中鉴定病毒序列对于病毒分析至关重要。由于下一代测序会产生大量的短序列,大多数方法都利用离散稀疏的独热向量来编码核苷酸序列,而这种方法在病毒鉴定中通常效果不佳。

结果

本文提出了一种基于深度学习的宏基因组数据中病毒序列识别工具 Virtifier。它包括一种名为 Seq2Vec 的有意义的核苷酸序列编码方法和一种基于注意力机制的长短时记忆网络(LSTM)的变体病毒序列预测器。通过利用一个完全训练好的嵌入矩阵对密码子进行编码,Seq2Vec 可以有效地提取核苷酸序列中这些密码子之间的关系。结合注意力层,LSTM 神经网络可以进一步分析密码子关系,并筛选出对最终特征有贡献的部分。三个数据集的实验结果表明,Virtifier 可以准确识别来自宏基因组的短病毒序列(<500bp),优于三种广泛使用的方法 VirFinder、DeepVirFinder 和 PPR-Meta。同时,Virtifier 在更长的序列长度(>5000bp)上也能达到相当的性能。

可用性和实现

Virtifier 的 Python 实现以及为这项研究开发的 Python 代码已在 Github 上提供 https://github.com/crazyinter/Seq2Vec。本文中的 RefSeq 基因组可在 VirFinder 中获得 https://dx.doi.org/10.1186/s40168-017-0283-5。本文中的 CAMI Challenge Dataset 3 CAMI_high 数据集可在 CAMI 中获得 https://data.cami-challenge.org/participate。本文中的真实人类肠道宏基因组可在 https://dx.doi.org/10.1101/gr.142315.112 中获得。

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

1
Virtifier: a deep learning-based identifier for viral sequences from metagenomes.Virtifier:一种基于深度学习的宏基因组病毒序列标识符。
Bioinformatics. 2022 Feb 7;38(5):1216-1222. doi: 10.1093/bioinformatics/btab845.
2
RNN-VirSeeker: A Deep Learning Method for Identification of Short Viral Sequences From Metagenomes.RNN-VirSeeker:一种从宏基因组中鉴定短病毒序列的深度学习方法。
IEEE/ACM Trans Comput Biol Bioinform. 2022 May-Jun;19(3):1840-1849. doi: 10.1109/TCBB.2020.3044575. Epub 2022 Jun 3.
3
DETIRE: a hybrid deep learning model for identifying viral sequences from metagenomes.DETIRE:一种用于从宏基因组中识别病毒序列的混合深度学习模型。
Front Microbiol. 2023 Jun 16;14:1169791. doi: 10.3389/fmicb.2023.1169791. eCollection 2023.
4
CoCoNet: an efficient deep learning tool for viral metagenome binning.CoCoNet:一种用于病毒宏基因组分箱的高效深度学习工具。
Bioinformatics. 2021 Sep 29;37(18):2803-2810. doi: 10.1093/bioinformatics/btab213.
5
Identifying viruses from metagenomic data using deep learning.利用深度学习从宏基因组数据中识别病毒。
Quant Biol. 2020 Mar;8(1):64-77. doi: 10.1007/s40484-019-0187-4.
6
VirGrapher: a graph-based viral identifier for long sequences from metagenomes.VirGrapher:一种基于图的宏基因组长序列病毒识别工具。
Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae036.
7
VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data.VirFinder:一种新型的基于 k-mer 的工具,用于从组装的宏基因组数据中识别病毒序列。
Microbiome. 2017 Jul 6;5(1):69. doi: 10.1186/s40168-017-0283-5.
8
Learning, visualizing and exploring 16S rRNA structure using an attention-based deep neural network.使用基于注意力的深度神经网络学习、可视化和探索 16S rRNA 结构。
PLoS Comput Biol. 2021 Sep 22;17(9):e1009345. doi: 10.1371/journal.pcbi.1009345. eCollection 2021 Sep.
9
PlasGUN: gene prediction in plasmid metagenomic short reads using deep learning.PlasGUN:使用深度学习进行质粒宏基因组短读测序中的基因预测。
Bioinformatics. 2020 May 1;36(10):3239-3241. doi: 10.1093/bioinformatics/btaa103.
10
Virsearcher: Identifying Bacteriophages from Metagenomes by Combining Convolutional Neural Network and Gene Information.Virsearcher:通过卷积神经网络和基因信息相结合从宏基因组中鉴定噬菌体。
IEEE/ACM Trans Comput Biol Bioinform. 2023 Jan-Feb;20(1):763-774. doi: 10.1109/TCBB.2022.3161135. Epub 2023 Feb 3.

引用本文的文献

1
Phage quest: a beginner's guide to explore viral diversity in the prokaryotic world.噬菌体探索:探索原核生物世界中病毒多样性的初学者指南。
Brief Bioinform. 2025 Aug 31;26(5). doi: 10.1093/bib/bbaf449.
2
NextVir: Enabling classification of tumor-causing viruses with genomic foundation models.NextVir:利用基因组基础模型实现致瘤病毒分类
PLoS Comput Biol. 2025 Aug 21;21(8):e1013360. doi: 10.1371/journal.pcbi.1013360. eCollection 2025 Aug.
3
VirNucPro: an identifier for the identification of viral short sequences using six-frame translation and large language models.
VirNucPro:一种使用六框架翻译和大语言模型来识别病毒短序列的标识符。
Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf224.
4
FGeneBERT: function-driven pre-trained gene language model for metagenomics.FGeneBERT:用于宏基因组学的功能驱动型预训练基因语言模型
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf149.
5
HPV-KITE: sequence analysis software for rapid HPV genotype detection.HPV-KITE:用于快速检测人乳头瘤病毒基因型的序列分析软件。
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf155.
6
A review of neural networks for metagenomic binning.宏基因组分箱的神经网络综述。
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf065.
7
Uncovering the hidden RNA virus diversity in Lake Nam Co: Evolutionary insights from an extreme high-altitude environment.揭示纳木错湖隐藏的RNA病毒多样性:来自极端高海拔环境的进化见解。
Proc Natl Acad Sci U S A. 2025 Feb 11;122(6):e2420162122. doi: 10.1073/pnas.2420162122. Epub 2025 Feb 4.
8
VirDetect-AI: a residual and convolutional neural network-based metagenomic tool for eukaryotic viral protein identification.VirDetect-AI:一种基于残差和卷积神经网络的宏基因组工具,用于真核病毒蛋白鉴定。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf001.
9
DeePhafier: a phage lifestyle classifier using a multilayer self-attention neural network combining protein information.DeePhafier:一种使用结合蛋白质信息的多层自注意力神经网络的噬菌体生活方式分类器。
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae377.
10
Hecatomb: an integrated software platform for viral metagenomics.Hecatomb:病毒宏基因组学的集成软件平台。
Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giae020.