IEEE/ACM Trans Comput Biol Bioinform. 2022 May-Jun;19(3):1257-1266. doi: 10.1109/TCBB.2020.3003941. Epub 2022 Jun 3.
The semantic similarity of gene ontology (GO) terms is widely used to predict protein-protein interactions (PPIs). The traditional semantic similarity measures are based mainly on manually crafted features, which may ignore some important hidden information of the gene ontology. Moreover, those methods usually obtain the similarity between proteins from similarity between GO terms by some simple statistical rules, such as MAX and BMA (best-match average), oversimplifying the possible complex relationship between the proteins and the GO terms annotated with them. To overcome the two deficiencies, we propose a new method named protein2vec, which characterizes a protein with a vector based on the GO terms annotated to it and combines the information of both the GO and known PPIs. We firstly try to apply the network embedding algorithm on the GO network to generate feature vectors for each GO term. Then, Long Short-Time Memory (LSTM) encodes the feature vectors of the GO terms annotated with a protein into another vector (called protein vector). Finally, two protein vectors are forwarded into a feedforward neural network to predict the interaction between the two corresponding proteins. The experimental results show that protein2vec outperforms almost all commonly used traditional semantic similarity methods.
GO 术语的语义相似性被广泛用于预测蛋白质-蛋白质相互作用 (PPIs)。传统的语义相似性度量方法主要基于手工制作的特征,这可能忽略了基因本体论中一些重要的隐藏信息。此外,这些方法通常通过一些简单的统计规则(如 MAX 和 BMA(最佳匹配平均))从 GO 术语的相似性来获得蛋白质之间的相似性,从而简化了蛋白质与对其进行注释的 GO 术语之间可能存在的复杂关系。为了克服这两个缺陷,我们提出了一种名为 protein2vec 的新方法,该方法基于对其进行注释的 GO 术语来用向量对蛋白质进行特征化,并结合了 GO 和已知 PPIs 的信息。我们首先尝试将网络嵌入算法应用于 GO 网络,为每个 GO 术语生成特征向量。然后,长短期记忆 (LSTM) 将具有蛋白质注释的 GO 术语的特征向量编码为另一个向量(称为蛋白质向量)。最后,将两个蛋白质向量转发到前馈神经网络中,以预测两个相应蛋白质之间的相互作用。实验结果表明,protein2vec 优于几乎所有常用的传统语义相似性方法。