State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, No. 2005 Songhu Road, Shanghai, 200433, China.
Shanghai Sci-Tech Inno Center for Infection & Immunity, No. 1688 Guoquan Bei Road, Shanghai, China.
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae484.
Phages, the natural predators of bacteria, were discovered more than 100 years ago. However, increasing antimicrobial resistance rates have revitalized phage research. Methods that are more time-consuming and efficient than wet-laboratory experiments are needed to help screen phages quickly for therapeutic use. Traditional computational methods usually ignore the fact that phage-bacteria interactions are achieved by key genes and proteins. Methods for intraspecific prediction are rare since almost all existing methods consider only interactions at the species and genus levels. Moreover, most strains in existing databases contain only partial genome information because whole-genome information for species is difficult to obtain. Here, we propose a new approach for interaction prediction by constructing new features from key genes and proteins via the application of K-means sampling to select high-quality negative samples for prediction. Finally, we develop DeepPBI-KG, a corresponding prediction tool based on feature selection and a deep neural network. The results show that the average area under the curve for prediction reached 0.93 for each strain, and the overall AUC and area under the precision-recall curve reached 0.89 and 0.92, respectively, on the independent test set; these values are greater than those of other existing prediction tools. The forward and reverse validation results indicate that key genes and key proteins regulate and influence the interaction, which supports the reliability of the model. In addition, intraspecific prediction experiments based on Klebsiella pneumoniae data demonstrate the potential applicability of DeepPBI-KG for intraspecific prediction. In summary, the feature engineering and interaction prediction approaches proposed in this study can effectively improve the robustness and stability of interaction prediction, can achieve high generalizability, and may provide new directions and insights for rapid phage screening for therapy.
噬菌体是细菌的天然捕食者,早在 100 多年前就被发现了。然而,由于抗菌药物耐药率的不断上升,噬菌体的研究又重新活跃起来。为了快速筛选出可用于治疗的噬菌体,需要开发比传统湿实验更耗时、更高效的方法。传统的计算方法通常忽略了噬菌体-细菌相互作用是通过关键基因和蛋白质实现的事实。由于几乎所有现有的方法都只考虑了种和属水平的相互作用,因此针对种内预测的方法很少。此外,由于获取物种的全基因组信息非常困难,因此现有的数据库中大多数菌株仅包含部分基因组信息。在这里,我们通过应用 K-means 采样从关键基因和蛋白质中构建新特征,提出了一种新的交互预测方法,为预测选择高质量的负样本。最后,我们开发了基于特征选择和深度神经网络的相应预测工具 DeepPBI-KG。结果表明,对于每个菌株,预测的平均曲线下面积达到 0.93,独立测试集的整体 AUC 和精度-召回曲线下面积分别达到 0.89 和 0.92,均优于其他现有预测工具。正向和反向验证结果表明,关键基因和关键蛋白质调节和影响相互作用,这支持了模型的可靠性。此外,基于肺炎克雷伯氏菌数据的种内预测实验表明,DeepPBI-KG 具有种内预测的潜在适用性。综上所述,本研究提出的特征工程和相互作用预测方法可以有效提高相互作用预测的鲁棒性和稳定性,具有较高的泛化能力,可能为快速筛选噬菌体治疗提供新的方向和思路。