Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510280, China.
Gigascience. 2022 Aug 11;11. doi: 10.1093/gigascience/giac076.
Many biological properties of phages are determined by phage virion proteins (PVPs), and the poor annotation of PVPs is a bottleneck for many areas of viral research, such as viral phylogenetic analysis, viral host identification, and antibacterial drug design. Because of the high diversity of PVP sequences, the PVP annotation of a phage genome remains a particularly challenging bioinformatic task.
Based on deep learning, we developed DeePVP. The main module of DeePVP aims to discriminate PVPs from non-PVPs within a phage genome, while the extended module of DeePVP can further classify predicted PVPs into the 10 major classes of PVPs. Compared with the present state-of-the-art tools, the main module of DeePVP performs better, with a 9.05% higher F1-score in the PVP identification task. Moreover, the overall accuracy of the extended module of DeePVP in the PVP classification task is approximately 3.72% higher than that of PhANNs. Two application cases show that the predictions of DeePVP are more reliable and can better reveal the compact PVP-enriched region than the current state-of-the-art tools. Particularly, in the Escherichia phage phiEC1 genome, a novel PVP-enriched region that is conserved in many other Escherichia phage genomes was identified, indicating that DeePVP will be a useful tool for the analysis of phage genomic structures.
DeePVP outperforms state-of-the-art tools. The program is optimized in both a virtual machine with graphical user interface and a docker so that the tool can be easily run by noncomputer professionals. DeePVP is freely available at https://github.com/fangzcbio/DeePVP/.
噬菌体的许多生物学特性取决于噬菌体衣壳蛋白(PVPs),而 PVPs 的注释不足是病毒学研究诸多领域的一个瓶颈,如病毒系统发育分析、病毒宿主鉴定和抗菌药物设计。由于 PVP 序列的高度多样性,噬菌体基因组的 PVP 注释仍然是一个特别具有挑战性的生物信息学任务。
基于深度学习,我们开发了 DeePVP。DeePVP 的主要模块旨在在噬菌体基因组内区分 PVPs 和非 PVPs,而 DeePVP 的扩展模块则可以进一步将预测的 PVPs 分为 10 个主要的 PVP 类别。与目前最先进的工具相比,DeePVP 的主要模块在 PVP 识别任务中的 F1 得分为 9.05%更高。此外,DeePVP 的扩展模块在 PVP 分类任务中的整体准确率比 PhANNs 高约 3.72%。两个应用案例表明,DeePVP 的预测更可靠,并且可以更好地揭示紧凑的 PVP 富集区域,优于当前最先进的工具。特别是,在大肠杆菌噬菌体 phiEC1 基因组中,鉴定出了一个在许多其他大肠杆菌噬菌体基因组中保守的新的 PVP 富集区域,这表明 DeePVP 将成为分析噬菌体基因组结构的有用工具。
DeePVP 优于最先进的工具。该程序在具有图形用户界面的虚拟机和 docker 中进行了优化,以便非计算机专业人员可以轻松运行该工具。DeePVP 可在 https://github.com/fangzcbio/DeePVP/ 上免费获得。