He Fei, Wang Rui, Li Jiagen, Bao Lingling, Xu Dong, Zhao Xiaowei
School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China.
Institution of Computational Biology, Northeast Normal University, Changchun, 130117, China.
BMC Syst Biol. 2018 Nov 22;12(Suppl 6):109. doi: 10.1186/s12918-018-0628-0.
Ubiquitination, which is also called "lysine ubiquitination", occurs when an ubiquitin is attached to lysine (K) residues in targeting proteins. As one of the most important post translational modifications (PTMs), it plays the significant role not only in protein degradation, but also in other cellular functions. Thus, systematic anatomy of the ubiquitination proteome is an appealing and challenging research topic. The existing methods for identifying protein ubiquitination sites can be divided into two kinds: mass spectrometry and computational methods. Mass spectrometry-based experimental methods can discover ubiquitination sites from eukaryotes, but are time-consuming and expensive. Therefore, it is priority to develop computational approaches that can effectively and accurately identify protein ubiquitination sites.
The existing computational methods usually require feature engineering, which may lead to redundancy and biased representations. While deep learning is able to excavate underlying characteristics from large-scale training data via multiple-layer networks and non-linear mapping operations. In this paper, we proposed a deep architecture within multiple modalities to identify the ubiquitination sites. First, according to prior knowledge and biological knowledge, we encoded protein sequence fragments around candidate ubiquitination sites into three modalities, namely raw protein sequence fragments, physico-chemical properties and sequence profiles, and designed different deep network layers to extract the hidden representations from them. Then, the generative deep representations corresponding to three modalities were merged to build the final model. We performed our algorithm on the available largest scale protein ubiquitination sites database PLMD, and achieved 66.4% specificity, 66.7% sensitivity, 66.43% accuracy, and 0.221 MCC value. A number of comparative experiments also indicated that our multimodal deep architecture outperformed several popular protein ubiquitination site prediction tools.
The results of comparative experiments validated the effectiveness of our deep network and also displayed that our method outperformed several popular protein ubiquitination site prediction tools. The source codes of our proposed method are available at https://github.com/jiagenlee/deepUbiquitylation .
泛素化,也称为“赖氨酸泛素化”,是指泛素附着于靶蛋白中的赖氨酸(K)残基上的过程。作为最重要的翻译后修饰(PTM)之一,它不仅在蛋白质降解中起重要作用,还参与其他细胞功能。因此,对泛素化蛋白质组进行系统剖析是一个有吸引力且具有挑战性的研究课题。现有的识别蛋白质泛素化位点的方法可分为两类:质谱法和计算方法。基于质谱的实验方法可以从真核生物中发现泛素化位点,但耗时且昂贵。因此,开发能够有效且准确识别蛋白质泛素化位点的计算方法成为当务之急。
现有的计算方法通常需要特征工程,这可能导致冗余和有偏差的表示。而深度学习能够通过多层网络和非线性映射操作从大规模训练数据中挖掘潜在特征。在本文中,我们提出了一种多模态深度架构来识别泛素化位点。首先,根据先验知识和生物学知识,我们将候选泛素化位点周围的蛋白质序列片段编码为三种模态,即原始蛋白质序列片段、物理化学性质和序列谱,并设计不同的深度网络层从它们中提取隐藏表示。然后,将对应于三种模态的生成性深度表示合并以构建最终模型。我们在可用的最大规模蛋白质泛素化位点数据库PLMD上运行我们的算法,获得了66.4%的特异性、66.7%的敏感性、66.43%的准确率和0.221的马修斯相关系数(MCC)值。一系列对比实验还表明,我们的多模态深度架构优于几种流行的蛋白质泛素化位点预测工具。
对比实验结果验证了我们深度网络的有效性,也表明我们的方法优于几种流行的蛋白质泛素化位点预测工具。我们提出的方法的源代码可在https://github.com/jiagenlee/deepUbiquitylation获取。