Luo Xin, Wang Liwei, Hu Pengwei, Hu Lun
IEEE/ACM Trans Comput Biol Bioinform. 2023 Sep-Oct;20(5):3182-3194. doi: 10.1109/TCBB.2023.3273567. Epub 2023 Oct 9.
Protein-protein interactions (PPIs) play a critical role in the proteomics study, and a variety of computational algorithms have been developed to predict PPIs. Though effective, their performance is constrained by high false-positive and false-negative rates observed in PPI data. To overcome this problem, a novel PPI prediction algorithm, namely PASNVGA, is proposed in this work by combining the sequence and network information of proteins via variational graph autoencoder. To do so, PASNVGA first applies different strategies to extract the features of proteins from their sequence and network information, and obtains a more compact form of these features using principal component analysis. In addition, PASNVGA designs a scoring function to measure the higher-order connectivity between proteins and so as to obtain a higher-order adjacency matrix. With all these features and adjacency matrices, PASNVGA trains a variational graph autoencoder model to further learn the integrated embeddings of proteins. The prediction task is then completed by using a simple feedforward neural network. Extensive experiments have been conducted on five PPI datasets collected from different species. Compared with several state-of-the-art algorithms, PASNVGA has been demonstrated as a promising PPI prediction algorithm.
蛋白质-蛋白质相互作用(PPIs)在蛋白质组学研究中起着关键作用,并且已经开发了多种计算算法来预测PPIs。尽管这些算法有效,但它们的性能受到PPI数据中高假阳性率和假阴性率的限制。为了克服这个问题,本文通过变分图自动编码器结合蛋白质的序列和网络信息,提出了一种新颖的PPI预测算法,即PASNVGA。为此,PASNVGA首先应用不同的策略从蛋白质的序列和网络信息中提取特征,并使用主成分分析获得这些特征的更紧凑形式。此外,PASNVGA设计了一个评分函数来衡量蛋白质之间的高阶连通性,从而获得一个高阶邻接矩阵。利用所有这些特征和邻接矩阵,PASNVGA训练一个变分图自动编码器模型来进一步学习蛋白质的集成嵌入。然后通过使用一个简单的前馈神经网络完成预测任务。已在从不同物种收集的五个PPI数据集上进行了广泛的实验。与几种最先进的算法相比,PASNVGA已被证明是一种有前途的PPI预测算法。