Sirén Kimmo, Millard Andrew, Petersen Bent, Gilbert M Thomas P, Clokie Martha R J, Sicheritz-Pontén Thomas
Section for Evolutionary Genomics, The GLOBE Institute, University of Copenhagen, Copenhagen,1353 Denmark.
Department of Genetics and Genome Biology, University of Leicester, LE1 7RH Leicester, UK.
NAR Genom Bioinform. 2021 Jan 6;3(1):lqaa109. doi: 10.1093/nargab/lqaa109. eCollection 2021 Mar.
Prophages are phages that are integrated into bacterial genomes and which are key to understanding many aspects of bacterial biology. Their extreme diversity means they are challenging to detect using sequence similarity, yet this remains the paradigm and thus many phages remain unidentified. We present a novel, fast and generalizing machine learning method based on feature space to facilitate novel prophage discovery. To validate the approach, we reanalyzed publicly available marine viromes and single-cell genomes using our feature-based approaches and found consistently more phages than were detected using current state-of-the-art tools while being notably faster. This demonstrates that our approach significantly enhances bacteriophage discovery and thus provides a new starting point for exploring new biologies.
原噬菌体是整合到细菌基因组中的噬菌体,对于理解细菌生物学的诸多方面至关重要。它们的极端多样性意味着利用序列相似性进行检测具有挑战性,但这仍是目前的范式,因此许多噬菌体仍未被识别。我们提出了一种基于特征空间的新颖、快速且具有通用性的机器学习方法,以促进新型原噬菌体的发现。为验证该方法,我们使用基于特征的方法重新分析了公开可用的海洋病毒群落和单细胞基因组,发现与使用当前最先进工具检测到的噬菌体相比,始终能发现更多噬菌体,且速度明显更快。这表明我们的方法显著增强了噬菌体的发现能力,从而为探索新生物学提供了新的起点。