IEEE/ACM Trans Comput Biol Bioinform. 2021 Sep-Oct;18(5):1801-1810. doi: 10.1109/TCBB.2020.3017386. Epub 2021 Oct 7.
Multi-drug resistance (MDR) has become one of the greatest threats to human health worldwide, and novel treatment methods of infections caused by MDR bacteria are urgently needed. Phage therapy is a promising alternative to solve this problem, to which the key is correctly matching target pathogenic bacteria with the corresponding therapeutic phage. Deep learning is powerful for mining complex patterns to generate accurate predictions. In this study, we develop PredPHI (Predicting Phage-Host Interactions), a deep learning-based tool capable of predicting the host of phages from sequence data. We collect >3000 phage-host pairs along with their protein sequences from PhagesDB and GenBank databases and extract a set of features. Then we select high-quality negative samples based on the K-Means clustering method and construct a balanced training set. Finally, we employ a deep convolutional neural network to build the predictive model. The results indicate that PredPHI can achieve a predictive performance of 81 percent in terms of the area under the receiver operating characteristic curve on the test set, and the clustering-based method is significantly more robust than that based on randomly selecting negative samples. These results highlight that PredPHI is a useful and accurate tool for identifying phage-host interactions from sequence data.
多药耐药性(MDR)已成为全球范围内人类健康的最大威胁之一,迫切需要新的治疗方法来治疗 MDR 细菌感染。噬菌体治疗是解决这一问题的一种很有前途的方法,关键是要正确地将目标病原菌与相应的治疗噬菌体相匹配。深度学习在挖掘复杂模式以生成准确预测方面非常强大。在这项研究中,我们开发了 PredPHI(噬菌体-宿主相互作用预测),这是一种基于深度学习的工具,能够从序列数据中预测噬菌体的宿主。我们从 PhagesDB 和 GenBank 数据库中收集了超过 3000 对噬菌体-宿主对及其蛋白质序列,并提取了一组特征。然后,我们根据 K-Means 聚类方法选择高质量的负样本,并构建一个平衡的训练集。最后,我们使用深度卷积神经网络来构建预测模型。结果表明,PredPHI 在测试集上的接收者操作特征曲线下面积的预测性能达到 81%,基于聚类的方法比基于随机选择负样本的方法更稳健。这些结果表明,PredPHI 是一种从序列数据中识别噬菌体-宿主相互作用的有用且准确的工具。