Wei Ankang, Xiao Zhen, Fu Lingling, Zhao Weizhong, Jiang Xingpeng
School of Mathematics and Statistics, Central China Normal University, Wuhan 430079, China.
Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China.
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae672.
Identifying phage-host interactions (PHIs) is a crucial step in developing phage therapy, which is the promising solution to addressing the issue of antibiotic resistance in superbugs. However, the lifestyle of phages, which strongly depends on their host for life activities, limits their cultivability, making the study of predicting PHIs time-consuming and labor-intensive for traditional wet lab experiments. Although many deep learning (DL) approaches have been applied to PHIs prediction, most DL methods are predominantly based on sequence information, failing to comprehensively model the intricate relationships within PHIs. Moreover, most existing approaches are limited for sub-optimal performance, due to the potential risk of overfitting induced by the highly data sparsity in the task of PHIs prediction. In this study, we propose a novel approach called MI-RGC, which introduces mutual information for feature augmentation and employs regional graph convolution to learn meaningful representations. Specifically, MI-RGC treats the presence status of phages in environmental samples as random variables, and derives the mutual information between these random variables as the dependency relationships among phages. Consequently, a mutual information-based heterogeneous network is construted as feature augmentation for sequence information of phages, which is utilized for building a sequence information-based heterogeneous network. By considering the different contributions of neighboring nodes at varying distances, a regional graph convolutional model is designed, in which the neighboring nodes are segmented into different regions and a regional-level attention mechanism is employed to derive node embeddings. Finally, the embeddings learned from these two networks are aggregated through an attention mechanism, on which the prediction of PHIs is condcuted accordingly. Experimental results on three benchmark datasets demonstrate that MI-RGC derives superior performance over other methods on the task of PHIs prediction.
识别噬菌体 - 宿主相互作用(PHIs)是开发噬菌体疗法的关键步骤,噬菌体疗法是解决超级细菌抗生素耐药性问题的有前景的解决方案。然而,噬菌体的生活方式强烈依赖宿主进行生命活动,这限制了它们的可培养性,使得通过传统湿实验室实验预测PHIs既耗时又费力。尽管许多深度学习(DL)方法已应用于PHIs预测,但大多数DL方法主要基于序列信息,未能全面建模PHIs内复杂的关系。此外,由于PHIs预测任务中高度数据稀疏性导致的过拟合潜在风险,大多数现有方法的性能欠佳。在本研究中,我们提出了一种名为MI - RGC的新方法,该方法引入互信息进行特征增强,并采用区域图卷积来学习有意义的表示。具体而言,MI - RGC将环境样本中噬菌体的存在状态视为随机变量,并将这些随机变量之间的互信息推导为噬菌体之间的依赖关系。因此,构建了一个基于互信息的异质网络作为噬菌体序列信息的特征增强,用于构建基于序列信息的异质网络。通过考虑不同距离的相邻节点的不同贡献,设计了一种区域图卷积模型,其中相邻节点被分割成不同区域,并采用区域级注意力机制来推导节点嵌入。最后,通过注意力机制聚合从这两个网络学到的嵌入,并据此进行PHIs预测。在三个基准数据集上的实验结果表明,MI - RGC在PHIs预测任务上的性能优于其他方法。