Pan Jie, Wang Rui, Liu Wenjing, Wang Li, You Zhuhong, Li Yuechao, Duan Zhemeng, Huang Qinghua, Feng Jie, Sun Yanmei, Wang Shiwei
Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, the College of Life Sciences, Northwest University, Xi'an 710069, China.
Department of Ophthalmology, The First Affiliated Hospital of Northwest University, 30 Fenxiang, the South Avenue, Xi'an, Shaanxi 710002, China.
iScience. 2024 Dec 19;28(1):111647. doi: 10.1016/j.isci.2024.111647. eCollection 2025 Jan 17.
Bacteriophages (phages) are increasingly viewed as a promising alternative for the treatment of antibiotic-resistant bacterial infections. However, the diversity of host ranges complicates the identification of target phages. Existing computational tools often fail to accurately identify phages across different bacterial species. In this study, we present GE-PHI, a machine-learning-based model for predicting phage-host interactions (PHIs) by integrating knowledge graph embedding algorithm with a large-scale protein language model. First, a phage-host heterogeneous association network (PHAN) was constructed that incorporated phage-phage and host-host similarity networks. Then, the multi-relational Poincaré graph embedding (MuRP) was used to extract topological patterns. Additionally, we employed the ESM-2 protein language model to capture evolutionary information from phage tail proteins and host-receptor-binding proteins. GE-PHI achieved a cross-validation area under the curve (AUC) of up to 0.9453 in silico and maintains this performance in case studies. This study provides insights into machine-learning-guided phage therapeutics and diagnostics in microbial engineering.
噬菌体越来越被视为治疗抗生素耐药性细菌感染的一种有前景的替代方法。然而,宿主范围的多样性使目标噬菌体的鉴定变得复杂。现有的计算工具常常无法准确识别不同细菌物种中的噬菌体。在本研究中,我们提出了GE-PHI,这是一种基于机器学习的模型,通过将知识图谱嵌入算法与大规模蛋白质语言模型相结合来预测噬菌体-宿主相互作用(PHIs)。首先,构建了一个噬菌体-宿主异质关联网络(PHAN),该网络纳入了噬菌体-噬菌体和宿主-宿主相似性网络。然后,使用多关系庞加莱图嵌入(MuRP)来提取拓扑模式。此外,我们采用ESM-2蛋白质语言模型从噬菌体尾部蛋白和宿主受体结合蛋白中捕获进化信息。GE-PHI在计算机模拟中实现了高达0.9453的交叉验证曲线下面积(AUC),并在案例研究中保持了这一性能。本研究为微生物工程中机器学习引导的噬菌体治疗和诊断提供了见解。