National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
Nucleic Acids Res. 2020 Dec 2;48(21):e121. doi: 10.1093/nar/gkaa856.
Recent advances in metagenomic sequencing have enabled discovery of diverse, distinct microbes and viruses. Bacteriophages, the most abundant biological entity on Earth, evolve rapidly, and therefore, detection of unknown bacteriophages in sequence datasets is a challenge. Most of the existing detection methods rely on sequence similarity to known bacteriophage sequences, impeding the identification and characterization of distinct, highly divergent bacteriophage families. Here we present Seeker, a deep-learning tool for alignment-free identification of phage sequences. Seeker allows rapid detection of phages in sequence datasets and differentiation of phage sequences from bacterial ones, even when those phages exhibit little sequence similarity to established phage families. We comprehensively validate Seeker's ability to identify previously unidentified phages, and employ this method to detect unknown phages, some of which are highly divergent from the known phage families. We provide a web portal (seeker.pythonanywhere.com) and a user-friendly Python package (github.com/gussow/seeker) allowing researchers to easily apply Seeker in metagenomic studies, for the detection of diverse unknown bacteriophages.
近年来,宏基因组测序技术的进步使得人们能够发现多种多样、独特的微生物和病毒。噬菌体是地球上最丰富的生物实体,它们进化迅速,因此,在序列数据集中检测未知噬菌体是一项挑战。大多数现有的检测方法依赖于与已知噬菌体序列的序列相似性,这阻碍了对独特的、高度分化的噬菌体家族的识别和特征描述。在这里,我们提出了 Seeker,这是一种用于无序列比对的噬菌体序列识别的深度学习工具。Seeker 允许在序列数据集中快速检测噬菌体,并区分噬菌体序列和细菌序列,即使这些噬菌体与已建立的噬菌体家族几乎没有序列相似性。我们全面验证了 Seeker 识别以前未识别的噬菌体的能力,并利用该方法检测未知噬菌体,其中一些噬菌体与已知的噬菌体家族高度分化。我们提供了一个网络门户(seeker.pythonanywhere.com)和一个用户友好的 Python 包(github.com/gussow/seeker),允许研究人员在宏基因组研究中轻松应用 Seeker,以检测各种未知的噬菌体。