Vieira M Fernanda, Duarte José, Domingues Rita, Oliveira Hugo, Dias Oscar
Center of Biological Engineering, University of Minho, 4710-057, Braga, Portugal.
Center of Biological Engineering, University of Minho, 4710-057, Braga, Portugal; LABBELS -Associate Laboratory, Braga/Guimarães, Portugal.
Comput Biol Med. 2025 Apr;188:109836. doi: 10.1016/j.compbiomed.2025.109836. Epub 2025 Feb 13.
Bacteriophages (phages) are the most predominant and genetically diverse biological entities on Earth. Phages are viruses that infect bacteria and encode numerous proteins with potential biotechnological application. However, most phage-encoded proteins remain functionally uncharacterized. Depolymerases (DPOs) in particular, enzymes that degrade external polysaccharide structures, have garnered increasing interest from both fundamental research standpoint and for biotechnological applications to control bacterial pathogens. Despite the proliferation of identification tools for predicting DPOs in phage genomes, we introduced PhageDPO as a robust and reliable solution. PhageDPO is trained on a comprehensive dataset that includes sequences related to seven specific DPO-related domains, completed with DPOs validated in the literature. Training a Support Vector Machine (SVM) model resulted in a test accuracy of 96 %, a recall of 97 %, a precision of 94 % and a F1-score of 96 %, demonstrating its capability in predicting DPOs in phage genomes. The model was further validated using both cases reported in the literature and newly generated data for this study, enhancing its performance. Beyond its predictive performance, PhageDPO distinguishes itself by offering a user-friendly interface coupled with robust performance, making it more accessible and effective compared to other tools with graphical interfaces.
噬菌体是地球上最主要且遗传多样性最高的生物实体。噬菌体是感染细菌的病毒,能编码众多具有潜在生物技术应用价值的蛋白质。然而,大多数噬菌体编码的蛋白质在功能上仍未得到表征。特别是解聚酶(DPO),这类降解外部多糖结构的酶,从基础研究角度以及用于控制细菌病原体的生物技术应用方面都引发了越来越多的关注。尽管预测噬菌体基因组中DPO的鉴定工具不断涌现,但我们推出了PhageDPO,它是一种强大且可靠的解决方案。PhageDPO基于一个全面的数据集进行训练,该数据集包含与七个特定DPO相关结构域相关的序列,并补充了文献中验证过的DPO。训练一个支持向量机(SVM)模型得到的测试准确率为96%,召回率为97%,精确率为94%,F1分数为96%,证明了其在预测噬菌体基因组中DPO的能力。该模型通过文献报道的案例和本研究新生成的数据进一步验证,提升了其性能。除了预测性能外,PhageDPO还通过提供用户友好的界面以及强大的性能脱颖而出,与其他具有图形界面的工具相比,它更易于使用且更有效。