South African National Bioinformatics Institute, South African MRC Bioinformatics Unit.
Department of Mathematics and Applied Mathematics, University of the Western Cape, Bellville, South Africa.
Bioinformatics. 2018 Dec 15;34(24):4159-4164. doi: 10.1093/bioinformatics/bty504.
Triplet amino acids have successfully been included in feature selection to predict human-HPV protein-protein interactions (PPI). The utility of supervised learning methods is curtailed due to experimental data not being available in sufficient quantities. Improvements in machine learning techniques and features selection will enhance the study of PPI between host and pathogen.
We present a comparison of a neural network model versus SVM for prediction of host-pathogen PPI based on a combination of features including: amino acid quadruplets, pairwise sequence similarity, and human interactome properties. The neural network and SVM were implemented using Python Sklearn library. The neural network model using quadruplet features and other network features outperformance the SVM model. The models are tested against published predictors and then applied to the human-B.anthracis case. Gene ontology term enrichment analysis identifies immunology response and regulation as functions of interacting proteins. For prediction of Human-viral PPI, our model (neural network) is a significant improvement in overall performance compared to a predictor using the triplets feature and achieves a good accuracy in predicting human-B.anthracis PPI.
All code can be downloaded from ftp://ftp.sanbi.ac.za/machine_learning/.
Supplementary data are available at Bioinformatics online.
三氨基酸已成功地被包含在特征选择中,以预测人类 HPV 蛋白质-蛋白质相互作用(PPI)。由于实验数据的数量不足,监督学习方法的实用性受到限制。机器学习技术和特征选择的改进将增强宿主与病原体之间的 PPI 研究。
我们比较了神经网络模型与 SVM 基于包括:四联体氨基酸、序列相似性对、和人类相互作用组特征的组合,对宿主-病原体 PPI 的预测。神经网络和 SVM 是使用 Python Sklearn 库实现的。使用四联体特征和其他网络特征的神经网络模型优于 SVM 模型。该模型经过已发表的预测器的测试,然后应用于人类-炭疽杆菌的情况。基因本体论术语富集分析确定了免疫反应和调节作为相互作用蛋白的功能。对于人类病毒 PPI 的预测,我们的模型(神经网络)在整体性能上与使用三联体特征的预测器相比有显著提高,并在预测人类-炭疽杆菌 PPI 方面取得了良好的准确性。
所有代码都可以从 ftp://ftp.sanbi.ac.za/machine_learning/ 下载。
补充数据可在生物信息学在线获得。