Suppr超能文献

基于图卷积网络的新型病毒-人类蛋白质-蛋白质相互作用预测。

Graph convolutional network based virus-human protein-protein interaction prediction for novel viruses.

机构信息

Department of Computer Engineering, Faculty of Engineering, Gebze Technical University, Kocaeli, Turkey.

Department of Information Technology, Faculty of Computer Engineering and Information Technology, Azarbaijan Shahid Madani University, Tabriz, Iran.

出版信息

Comput Biol Chem. 2022 Dec;101:107755. doi: 10.1016/j.compbiolchem.2022.107755. Epub 2022 Aug 13.

Abstract

Computational identification of human-virus protein-protein interactions (PHIs) is a worthwhile step towards understanding infection mechanisms. Analysis of the PHI networks is important for the determination of pathogenic diseases. Prediction of these interactions is a popular problem since experimental detection of PHIs is both time-consuming and expensive. The available methods use biological features like amino acid sequences, molecular structure, or biological activities for prediction. Recent studies show that the topological properties of proteins in protein-protein interaction (PPI) networks increase the performance of the predictions. The basic network projections, random-walk-based models, or graph neural networks are used for generating topologically enriched (hybrid) protein embeddings. In this study, we propose a three-stage machine learning pipeline that generates and uses hybrid embeddings for PHI prediction. In the first stage, numerical features are extracted from the amino acid sequences using the Doc2Vec and Byte Pair Encoding method. The amino acid embeddings are used as node features while training a modified GraphSAGE model, which is an improved version of the graph convolutional network. Lastly, the hybrid protein embeddings are used for training a binary interaction classifier model that predicts whether there is an interaction between the given two proteins or not. The proposed method is evaluated with comprehensive experiments to test its functionality and compare it with the state-of-art methods. The experimental results on the benchmark dataset prove the efficiency of the proposed model by having a 3-23% better area under curve (AUC) score than its competitors.

摘要

计算识别人类-病毒蛋白质-蛋白质相互作用(PHI)是理解感染机制的重要步骤。分析 PHI 网络对于确定致病性疾病很重要。由于实验检测 PHI 既耗时又昂贵,因此预测这些相互作用是一个热门问题。这些方法利用生物特征,如氨基酸序列、分子结构或生物活性进行预测。最近的研究表明,蛋白质-蛋白质相互作用(PPI)网络中蛋白质的拓扑性质提高了预测的性能。基本的网络投影、基于随机游走的模型或图神经网络用于生成拓扑丰富的(混合)蛋白质嵌入。在这项研究中,我们提出了一个三阶段机器学习管道,用于生成和使用混合嵌入进行 PHI 预测。在第一阶段,使用 Doc2Vec 和字节对编码方法从氨基酸序列中提取数值特征。氨基酸嵌入被用作节点特征,同时训练一个修改后的 GraphSAGE 模型,这是图卷积网络的改进版本。最后,使用混合蛋白质嵌入来训练二进制交互分类器模型,该模型预测给定的两个蛋白质之间是否存在相互作用。通过与最先进的方法进行比较,我们使用全面的实验来评估该方法的功能,以验证其有效性。在基准数据集上的实验结果表明,该模型的 AUC 得分比竞争对手高出 3-23%,证明了其效率。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验