Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, 100191, China.
Key Laboratory for Neuroscience, Ministry of Education/National Health Commission of China, Peking University, Beijing, 100191, China.
BMC Biol. 2022 Jul 14;20(1):162. doi: 10.1186/s12915-022-01364-6.
Degrons are short linear motifs, bound by E3 ubiquitin ligase to target protein substrates to be degraded by the ubiquitin-proteasome system. Mutations leading to deregulation of degron functionality disrupt control of protein abundance due to mistargeting of proteins destined for degradation and often result in pathologies. Targeting degrons by small molecules also emerges as an exciting drug design strategy to upregulate the expression of specific proteins. Despite their essential function and disease targetability, reliable identification of degrons remains a conundrum. Here, we developed a deep learning-based model named Degpred that predicts general degrons directly from protein sequences.
We showed that the BERT-based model performed well in predicting degrons singly from protein sequences. Then, we used the deep learning model Degpred to predict degrons proteome-widely. Degpred successfully captured typical degron-related sequence properties and predicted degrons beyond those from motif-based methods which use a handful of E3 motifs to match possible degrons. Furthermore, we calculated E3 motifs using predicted degrons on the substrates in our collected E3-substrate interaction dataset and constructed a regulatory network of protein degradation by assigning predicted degrons to specific E3s with calculated motifs. Critically, we experimentally verified that a predicted SPOP binding degron on CBX6 prompts CBX6 degradation and mediates the interaction with SPOP. We also showed that the protein degradation regulatory system is important in tumorigenesis by surveying degron-related mutations in TCGA.
Degpred provides an efficient tool to proteome-wide prediction of degrons and binding E3s singly from protein sequences. Degpred successfully captures typical degron-related sequence properties and predicts degrons beyond those from previously used motif-based methods, thus greatly expanding the degron landscape, which should advance the understanding of protein degradation, and allow exploration of uncharacterized alterations of proteins in diseases. To make it easier for readers to access collected and predicted datasets, we integrated these data into the website http://degron.phasep.pro/ .
降解肽是短的线性基序,与 E3 泛素连接酶结合,将靶蛋白底物靶向到泛素-蛋白酶体系统中降解。导致降解肽功能失调的突变会破坏蛋白质丰度的控制,因为蛋白质的靶向错误,这些蛋白质注定要降解,并且经常导致病理学。通过小分子靶向降解肽也成为一种令人兴奋的药物设计策略,以上调特定蛋白质的表达。尽管它们具有重要的功能和疾病靶向性,但可靠地识别降解肽仍然是一个难题。在这里,我们开发了一种基于深度学习的模型,命名为 Degpred,该模型可以直接从蛋白质序列中预测一般的降解肽。
我们表明,基于 BERT 的模型在从蛋白质序列中单预测降解肽方面表现良好。然后,我们使用深度学习模型 Degpred 在蛋白质组范围内预测降解肽。Degpred 成功地捕获了典型的降解肽相关序列特性,并预测了那些基于基序的方法无法预测的降解肽,这些方法仅使用少数 E3 基序来匹配可能的降解肽。此外,我们在我们收集的 E3-底物相互作用数据集中,使用预测的降解肽在底物上计算 E3 基序,并通过将预测的降解肽分配给具有计算基序的特定 E3,构建蛋白质降解的调控网络。至关重要的是,我们通过在 TCGA 中调查降解肽相关突变,实验验证了预测的 SPOP 结合降解肽在 CBX6 上促使 CBX6 降解并介导与 SPOP 的相互作用。我们还表明,通过调查 TCGA 中的降解肽相关突变,蛋白质降解调控系统在肿瘤发生中很重要。
Degpred 提供了一种从蛋白质序列中单预测降解肽和结合 E3 的有效工具。Degpred 成功地捕获了典型的降解肽相关序列特性,并预测了那些基于先前使用的基序方法无法预测的降解肽,从而大大扩展了降解肽景观,这应该有助于深入了解蛋白质降解,并允许探索疾病中未表征的蛋白质改变。为了方便读者访问收集和预测的数据集,我们将这些数据集成到网站 http://degron.phasep.pro/ 中。