使用多模态深度架构对蛋白质泛素化位点进行大规模预测。

Large-scale prediction of protein ubiquitination sites using a multimodal deep architecture.

作者信息

He Fei, Wang Rui, Li Jiagen, Bao Lingling, Xu Dong, Zhao Xiaowei

机构信息

School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China.

Institution of Computational Biology, Northeast Normal University, Changchun, 130117, China.

出版信息

BMC Syst Biol. 2018 Nov 22;12(Suppl 6):109. doi: 10.1186/s12918-018-0628-0.

DOI:10.1186/s12918-018-0628-0

PMID:30463553

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6249717/

Abstract

BACKGROUND

Ubiquitination, which is also called "lysine ubiquitination", occurs when an ubiquitin is attached to lysine (K) residues in targeting proteins. As one of the most important post translational modifications (PTMs), it plays the significant role not only in protein degradation, but also in other cellular functions. Thus, systematic anatomy of the ubiquitination proteome is an appealing and challenging research topic. The existing methods for identifying protein ubiquitination sites can be divided into two kinds: mass spectrometry and computational methods. Mass spectrometry-based experimental methods can discover ubiquitination sites from eukaryotes, but are time-consuming and expensive. Therefore, it is priority to develop computational approaches that can effectively and accurately identify protein ubiquitination sites.

RESULTS

The existing computational methods usually require feature engineering, which may lead to redundancy and biased representations. While deep learning is able to excavate underlying characteristics from large-scale training data via multiple-layer networks and non-linear mapping operations. In this paper, we proposed a deep architecture within multiple modalities to identify the ubiquitination sites. First, according to prior knowledge and biological knowledge, we encoded protein sequence fragments around candidate ubiquitination sites into three modalities, namely raw protein sequence fragments, physico-chemical properties and sequence profiles, and designed different deep network layers to extract the hidden representations from them. Then, the generative deep representations corresponding to three modalities were merged to build the final model. We performed our algorithm on the available largest scale protein ubiquitination sites database PLMD, and achieved 66.4% specificity, 66.7% sensitivity, 66.43% accuracy, and 0.221 MCC value. A number of comparative experiments also indicated that our multimodal deep architecture outperformed several popular protein ubiquitination site prediction tools.

CONCLUSION

The results of comparative experiments validated the effectiveness of our deep network and also displayed that our method outperformed several popular protein ubiquitination site prediction tools. The source codes of our proposed method are available at https://github.com/jiagenlee/deepUbiquitylation .

摘要

背景

泛素化，也称为“赖氨酸泛素化”，是指泛素附着于靶蛋白中的赖氨酸（K）残基上的过程。作为最重要的翻译后修饰（PTM）之一，它不仅在蛋白质降解中起重要作用，还参与其他细胞功能。因此，对泛素化蛋白质组进行系统剖析是一个有吸引力且具有挑战性的研究课题。现有的识别蛋白质泛素化位点的方法可分为两类：质谱法和计算方法。基于质谱的实验方法可以从真核生物中发现泛素化位点，但耗时且昂贵。因此，开发能够有效且准确识别蛋白质泛素化位点的计算方法成为当务之急。

结果

现有的计算方法通常需要特征工程，这可能导致冗余和有偏差的表示。而深度学习能够通过多层网络和非线性映射操作从大规模训练数据中挖掘潜在特征。在本文中，我们提出了一种多模态深度架构来识别泛素化位点。首先，根据先验知识和生物学知识，我们将候选泛素化位点周围的蛋白质序列片段编码为三种模态，即原始蛋白质序列片段、物理化学性质和序列谱，并设计不同的深度网络层从它们中提取隐藏表示。然后，将对应于三种模态的生成性深度表示合并以构建最终模型。我们在可用的最大规模蛋白质泛素化位点数据库PLMD上运行我们的算法，获得了66.4%的特异性、66.7%的敏感性、66.43%的准确率和0.221的马修斯相关系数（MCC）值。一系列对比实验还表明，我们的多模态深度架构优于几种流行的蛋白质泛素化位点预测工具。

结论

对比实验结果验证了我们深度网络的有效性，也表明我们的方法优于几种流行的蛋白质泛素化位点预测工具。我们提出的方法的源代码可在https://github.com/jiagenlee/deepUbiquitylation获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/174c/6249717/cf477c3e5d29/12918_2018_628_Fig1_HTML.jpg

相似文献

Large-scale prediction of protein ubiquitination sites using a multimodal deep architecture.

BMC Syst Biol. 2018 Nov 22;12(Suppl 6):109. doi: 10.1186/s12918-018-0628-0.

DeepUbi: a deep learning framework for prediction of ubiquitination sites in proteins.

BMC Bioinformatics. 2019 Feb 18;20(1):86. doi: 10.1186/s12859-019-2677-9.

Prediction of lysine ubiquitination with mRMR feature selection and analysis.

Amino Acids. 2012 Apr;42(4):1387-95. doi: 10.1007/s00726-011-0835-0. Epub 2011 Jan 26.

DeepTL-Ubi: A novel deep transfer learning method for effectively predicting ubiquitination sites of multiple species.

Methods. 2021 Aug;192:103-111. doi: 10.1016/j.ymeth.2020.08.003. Epub 2020 Aug 10.

An Ensemble Deep Learning based Predictor for Simultaneously Identifying Protein Ubiquitylation and SUMOylation Sites.

BMC Bioinformatics. 2021 Oct 24;22(1):519. doi: 10.1186/s12859-021-04445-5.

Characterization and identification of ubiquitin conjugation sites with E3 ligase recognition specificities.

BMC Bioinformatics. 2015;16 Suppl 1(Suppl 1):S1. doi: 10.1186/1471-2105-16-S1-S1. Epub 2015 Jan 21.

Multi-dimensional feature recognition model based on capsule network for ubiquitination site prediction.

PeerJ. 2022 Dec 6;10:e14427. doi: 10.7717/peerj.14427. eCollection 2022.

SiRNA silencing efficacy prediction based on a deep architecture.

BMC Genomics. 2018 Sep 24;19(Suppl 7):669. doi: 10.1186/s12864-018-5028-8.

Machine learning-based approaches for ubiquitination site prediction in human proteins.

BMC Bioinformatics. 2023 Nov 28;24(1):449. doi: 10.1186/s12859-023-05581-w.

Computational methods for ubiquitination site prediction using physicochemical properties of protein sequences.

BMC Bioinformatics. 2016 Mar 3;17:116. doi: 10.1186/s12859-016-0959-z.

引用本文的文献

Multimodal deep learning for predicting protein ubiquitination sites.

Bioinform Adv. 2025 Aug 20;5(1):vbaf200. doi: 10.1093/bioadv/vbaf200. eCollection 2025.

Ubigo-X: Protein ubiquitination site prediction using ensemble learning with image-based feature representation and weighted voting.

Comput Struct Biotechnol J. 2025 Jul 14;27:3137-3146. doi: 10.1016/j.csbj.2025.07.025. eCollection 2025.

Machine learning-based approaches for ubiquitination site prediction in human proteins.

BMC Bioinformatics. 2023 Nov 28;24(1):449. doi: 10.1186/s12859-023-05581-w.

Align-gram: Rethinking the Skip-gram Model for Protein Sequence Analysis.

Protein J. 2023 Apr;42(2):135-146. doi: 10.1007/s10930-023-10096-7. Epub 2023 Mar 28.

Current methodologies in protein ubiquitination characterization: from ubiquitinated protein to ubiquitin chain architecture.

Cell Biosci. 2022 Aug 12;12(1):126. doi: 10.1186/s13578-022-00870-y.

Mini-review: Recent advances in post-translational modification site prediction based on deep learning.

Comput Struct Biotechnol J. 2022 Jun 30;20:3522-3532. doi: 10.1016/j.csbj.2022.06.045. eCollection 2022.

Deep Learning-Based Advances In Protein Posttranslational Modification Site and Protein Cleavage Prediction.

Methods Mol Biol. 2022;2499:285-322. doi: 10.1007/978-1-0716-2317-6_15.

A Caps-Ubi Model for Protein Ubiquitination Site Prediction.

Front Plant Sci. 2022 May 25;13:884903. doi: 10.3389/fpls.2022.884903. eCollection 2022.

An Ensemble Deep Learning based Predictor for Simultaneously Identifying Protein Ubiquitylation and SUMOylation Sites.

BMC Bioinformatics. 2021 Oct 24;22(1):519. doi: 10.1186/s12859-021-04445-5.

CL-ACP: a parallel combination of CNN and LSTM anticancer peptide recognition model.

BMC Bioinformatics. 2021 Oct 20;22(1):512. doi: 10.1186/s12859-021-04433-9.

本文引用的文献

MusiteDeep: a deep-learning framework for general and kinase-specific phosphorylation site prediction.

Bioinformatics. 2017 Dec 15;33(24):3909-3916. doi: 10.1093/bioinformatics/btx496.

RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach.

BMC Bioinformatics. 2017 Feb 28;18(1):136. doi: 10.1186/s12859-017-1561-8.

ESA-UbiSite: accurate prediction of human ubiquitination sites by identifying a set of effective negatives.

Bioinformatics. 2017 Mar 1;33(5):661-668. doi: 10.1093/bioinformatics/btw701.

A New Scheme to Characterize and Identify Protein Ubiquitination Sites.

IEEE/ACM Trans Comput Biol Bioinform. 2017 Mar-Apr;14(2):393-403. doi: 10.1109/TCBB.2016.2520939. Epub 2016 Feb 8.

UbiSite: incorporating two-layered machine learning method with substrate motifs to predict ubiquitin-conjugation site on lysines.

BMC Syst Biol. 2016 Jan 11;10 Suppl 1(Suppl 1):6. doi: 10.1186/s12918-015-0246-z.

Predicting effects of noncoding variants with deep learning-based sequence model.

Nat Methods. 2015 Oct;12(10):931-4. doi: 10.1038/nmeth.3547. Epub 2015 Aug 24.

Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning.

Nat Biotechnol. 2015 Aug;33(8):831-8. doi: 10.1038/nbt.3300. Epub 2015 Jul 27.

RNA splicing. The human splicing code reveals new insights into the genetic determinants of disease.

Science. 2015 Jan 9;347(6218):1254806. doi: 10.1126/science.1254806. Epub 2014 Dec 18.

iUbiq-Lys: prediction of lysine ubiquitination sites in proteins by extracting sequence evolution information via a gray system model.

J Biomol Struct Dyn. 2015;33(8):1731-42. doi: 10.1080/07391102.2014.968875. Epub 2014 Nov 6.

CPLM: a database of protein lysine modifications.

Nucleic Acids Res. 2014 Jan;42(Database issue):D531-6. doi: 10.1093/nar/gkt1093. Epub 2013 Nov 8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用多模态深度架构对蛋白质泛素化位点进行大规模预测。

Large-scale prediction of protein ubiquitination sites using a multimodal deep architecture.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献