HVSeeker: a deep-learning-based method for identification of host and viral DNA sequences.

作者信息

Al-Najim Abdullatif, Hauns Sven, Tran Van Dinh, Backofen Rolf, Alkhnbashi Omer S

机构信息

Information and Computer Science Department, King Fahd University of Petroleum and Minerals, Dhahran 34462, Saudi Arabia.

Bioinformatics group,Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 101, 79110 Freiburg, Germany.

出版信息

Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf037.

Abstract

BACKGROUND

Bacteriophages are among the most abundant organisms on Earth, significantly impacting ecosystems and human society. The identification of viral sequences, especially novel ones, from mixed metagenomes is a critical first step in analyzing the viral components of host samples. This plays a key role in many downstream tasks. However, this is a challenging task due to their rapid evolution rate. The identification process typically involves two steps: distinguishing viral sequences from the host and identifying if they come from novel viral genomes. Traditional metagenomic techniques that rely on sequence similarity with known entities often fall short, especially when dealing with short or novel genomes. Meanwhile, deep learning has demonstrated its efficacy across various domains, including the bioinformatics field.

RESULTS

We have developed HVSeeker-a host/virus seeker method-based on deep learning to distinguish between bacterial and phage sequences. HVSeeker consists of two separate models: one analyzing DNA sequences and the other focusing on proteins. In addition to the robust architecture of HVSeeker, three distinct preprocessing methods were introduced to enhance the learning process: padding, contigs assembly, and sliding window. This method has shown promising results on sequences with various lengths, ranging from 200 to 1,500 base pairs. Tested on both NCBI and IMGVR databases, HVSeeker outperformed several methods from the literature such as Seeker, Rnn-VirSeeker, DeepVirFinder, and PPR-Meta. Moreover, when compared with other methods on benchmark datasets, HVSeeker has shown better performance, establishing its effectiveness in identifying unknown phage genomes.

CONCLUSIONS

These results demonstrate the exceptional structure of HVSeeker, which encompasses both the preprocessing methods and the model design. The advancements provided by HVSeeker are significant for identifying viral genomes and developing new therapeutic approaches, such as phage therapy. Therefore, HVSeeker serves as an essential tool in prokaryotic and phage taxonomy, offering a crucial first step toward analyzing the host-viral component of samples by identifying the host and viral sequences in mixed metagenomes.

摘要
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2d0f/12080225/4aeee1e474ab/giaf037fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索