IEEE/ACM Trans Comput Biol Bioinform. 2022 Mar-Apr;19(2):666-675. doi: 10.1109/TCBB.2021.3080386. Epub 2022 Apr 1.
Identifying protein subcellular locations is an important topic in protein function prediction. Interacting proteins may share similar locations. Thus, it is imperative to infer protein subcellular locations by taking protein-protein interactions (PPIs)into account. In this study, we present a network embedding-based method, node2loc, to identify protein subcellular locations. node2loc first learns distributed embeddings of proteins in a protein-protein interaction (PPI)network using node2vec. Then the learned embeddings are further fed into a recurrent neural network (RNN). To resolve the severe class imbalance of different subcellular locations, Synthetic Minority Over-sampling Technique (SMOTE)is applied to artificially synthesize proteins for minority classes. node2loc is evaluated on our constructed human benchmark dataset with 16 subcellular locations and yields a Matthews correlation coefficient (MCC)value of 0.800, which is superior to baseline methods. In addition, node2loc yields a better performance on a Yeast benchmark dataset with 17 locations. The results demonstrate that the learned representations from a PPI network have certain discriminative ability for classifying protein subcellular locations. However, node2loc is a transductive method, it only works for proteins connected in a PPI network, and it needs to be retrained for new proteins. In addition, the PPI network needs be annotated to some extent with location information. node2loc is freely available at https://github.com/xypan1232/node2loc.
鉴定蛋白质亚细胞位置是蛋白质功能预测中的一个重要课题。相互作用的蛋白质可能具有相似的位置。因此,通过考虑蛋白质-蛋白质相互作用(PPIs)来推断蛋白质亚细胞位置是至关重要的。在这项研究中,我们提出了一种基于网络嵌入的方法 node2loc,用于识别蛋白质亚细胞位置。node2loc 首先使用 node2vec 学习蛋白质 - 蛋白质相互作用(PPI)网络中蛋白质的分布式嵌入。然后,将学习到的嵌入进一步输入到递归神经网络(RNN)中。为了解决不同亚细胞位置的严重类不平衡问题,应用了合成少数过采样技术(SMOTE)来人为地为少数类合成蛋白质。在我们构建的具有 16 个亚细胞位置的人类基准数据集上评估了 node2loc,得到了马修斯相关系数(MCC)值为 0.800,优于基线方法。此外,node2loc 在具有 17 个位置的酵母基准数据集上也取得了更好的性能。结果表明,从 PPI 网络中学习到的表示对于分类蛋白质亚细胞位置具有一定的判别能力。然而,node2loc 是一种有传导性的方法,它仅适用于在 PPI 网络中连接的蛋白质,并且需要针对新的蛋白质进行重新训练。此外,PPI 网络需要在一定程度上标注位置信息。node2loc 可在 https://github.com/xypan1232/node2loc 上免费获得。