State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, Beijing, China.
Center for Quantitative Biology, Peking University, Beijing 100871, Beijing, China.
Gigascience. 2021 Sep 8;10(9). doi: 10.1093/gigascience/giab056.
Prokaryotic viruses referred to as phages can be divided into virulent and temperate phages. Distinguishing virulent and temperate phage-derived sequences in metavirome data is important for elucidating their different roles in interactions with bacterial hosts and regulation of microbial communities. However, there is no experimental or computational approach to effectively classify their sequences in culture-independent metavirome. We present a new computational method, DeePhage, which can directly and rapidly judge each read or contig as a virulent or temperate phage-derived fragment.
DeePhage uses a "one-hot" encoding form to represent DNA sequences in detail. Sequence signatures are detected via a convolutional neural network to obtain valuable local features. The accuracy of DeePhage on 5-fold cross-validation reaches as high as 89%, nearly 10% and 30% higher than that of 2 similar tools, PhagePred and PHACTS. On real metavirome, DeePhage correctly predicts the highest proportion of contigs when using BLAST as annotation, without apparent preferences. Besides, DeePhage reduces running time vs PhagePred and PHACTS by 245 and 810 times, respectively, under the same computational configuration. By direct detection of the temperate viral fragments from metagenome and metavirome, we furthermore propose a new strategy to explore phage transformations in the microbial community. The ability to detect such transformations provides us a new insight into the potential treatment for human disease.
DeePhage is a novel tool developed to rapidly and efficiently identify 2 kinds of phage fragments especially for metagenomics analysis. DeePhage is freely available via http://cqb.pku.edu.cn/ZhuLab/DeePhage or https://github.com/shufangwu/DeePhage.
原核病毒,又称噬菌体,可以分为烈性噬菌体和温和噬菌体。在宏病毒组数据中区分烈性噬菌体和温和噬菌体序列对于阐明它们在与细菌宿主相互作用和调节微生物群落方面的不同作用非常重要。然而,目前还没有实验或计算方法可以有效地对非培养宏病毒组中的序列进行分类。我们提出了一种新的计算方法 DeePhage,它可以直接快速地判断每个读取或连续序列是烈性噬菌体还是温和噬菌体衍生片段。
DeePhage 使用“独热”编码形式详细表示 DNA 序列。通过卷积神经网络检测序列特征,以获得有价值的局部特征。DeePhage 在 5 折交叉验证中的准确率高达 89%,比 2 种类似工具 PhagePred 和 PHACTS 分别高出近 10%和 30%。在真实的宏病毒组中,当使用 BLAST 作为注释时,DeePhage 正确预测了最高比例的连续序列,没有明显的偏好。此外,在相同的计算配置下,DeePhage 分别比 PhagePred 和 PHACTS 减少了 245 倍和 810 倍的运行时间。通过直接从宏基因组和宏病毒组中检测温和病毒片段,我们进一步提出了一种探索微生物群落中噬菌体转化的新策略。检测这种转化的能力为我们提供了一种新的思路,可能用于人类疾病的潜在治疗。
DeePhage 是一种新开发的工具,用于快速有效地识别 2 种噬菌体片段,特别是用于宏基因组分析。DeePhage 可通过 http://cqb.pku.edu.cn/ZhuLab/DeePhage 或 https://github.com/shufangwu/DeePhage 免费获得。