State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China.
State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China.
Bioinformatics. 2021 Dec 11;37(24):4771-4778. doi: 10.1093/bioinformatics/btab533.
To complement experimental efforts, machine learning-based computational methods are playing an increasingly important role to predict human-virus protein-protein interactions (PPIs). Furthermore, transfer learning can effectively apply prior knowledge obtained from a large source dataset/task to a small target dataset/task, improving prediction performance.
To predict interactions between human and viral proteins, we combine evolutionary sequence profile features with a Siamese convolutional neural network (CNN) architecture and a multi-layer perceptron. Our architecture outperforms various feature encodings-based machine learning and state-of-the-art prediction methods. As our main contribution, we introduce two transfer learning methods (i.e. 'frozen' type and 'fine-tuning' type) that reliably predict interactions in a target human-virus domain based on training in a source human-virus domain, by retraining CNN layers. Finally, we utilize the 'frozen' type transfer learning approach to predict human-SARS-CoV-2 PPIs, indicating that our predictions are topologically and functionally similar to experimentally known interactions.
The source codes and datasets are available at https://github.com/XiaodiYangCAU/TransPPI/.
Supplementary data are available at Bioinformatics online.
为了补充实验工作,基于机器学习的计算方法在预测人类 - 病毒蛋白质 - 蛋白质相互作用(PPIs)方面发挥着越来越重要的作用。此外,迁移学习可以有效地将从大型源数据集/任务中获得的先验知识应用于小型目标数据集/任务,从而提高预测性能。
为了预测人类和病毒蛋白之间的相互作用,我们将进化序列特征与暹罗卷积神经网络(CNN)架构和多层感知器相结合。我们的架构优于各种基于机器学习和最先进预测方法的特征编码。作为我们的主要贡献,我们引入了两种迁移学习方法(即“冻结”型和“微调”型),通过重新训练 CNN 层,可以可靠地基于源人类 - 病毒域中的训练来预测目标人类 - 病毒域中的相互作用。最后,我们利用“冻结”型迁移学习方法来预测人类 - SARS-CoV-2 PPIs,表明我们的预测在拓扑和功能上与实验已知的相互作用相似。
源代码和数据集可在 https://github.com/XiaodiYangCAU/TransPPI/ 获得。
补充数据可在生物信息学在线获得。