Pan Xiaoyong, Li Hao, Zeng Tao, Li Zhandong, Chen Lei, Huang Tao, Cai Yu-Dong
School of Life Sciences, Shanghai University, Shanghai, China.
Key Laboratory of System Control and Information Processing, Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Ministry of Education of China, Shanghai, China.
Front Genet. 2021 Jan 20;11:626500. doi: 10.3389/fgene.2020.626500. eCollection 2020.
The functions of proteins are mainly determined by their subcellular localizations in cells. Currently, many computational methods for predicting the subcellular localization of proteins have been proposed. However, these methods require further improvement, especially when used in protein representations. In this study, we present an embedding-based method for predicting the subcellular localization of proteins. We first learn the functional embeddings of KEGG/GO terms, which are further used in representing proteins. Then, we characterize the network embeddings of proteins on a protein-protein network. The functional and network embeddings are combined as novel representations of protein locations for the construction of the final classification model. In our collected benchmark dataset with 4,861 proteins from 16 locations, the best model shows a Matthews correlation coefficient of 0.872 and is thus superior to multiple conventional methods.
蛋白质的功能主要由其在细胞中的亚细胞定位决定。目前,已经提出了许多用于预测蛋白质亚细胞定位的计算方法。然而,这些方法需要进一步改进,特别是在用于蛋白质表示时。在本研究中,我们提出了一种基于嵌入的方法来预测蛋白质的亚细胞定位。我们首先学习KEGG/GO术语的功能嵌入,这些嵌入进一步用于表示蛋白质。然后,我们在蛋白质-蛋白质网络上表征蛋白质的网络嵌入。功能嵌入和网络嵌入相结合,作为蛋白质定位的新表示,用于构建最终的分类模型。在我们收集的包含来自16个定位的4861种蛋白质的基准数据集中,最佳模型的马修斯相关系数为0.872,因此优于多种传统方法。