使用可解释的机器学习阐明受 PROTAC 诱导降解作用靶向的全基因组未充分研究蛋白。

Elucidation of genome-wide understudied proteins targeted by PROTAC-induced degradation using interpretable machine learning.

机构信息

Department of Computer Science, Hunter College, The City University of New York, New York City, New York, United States of America.

Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York City, New York, United States of America.

出版信息

PLoS Comput Biol. 2023 Aug 17;19(8):e1010974. doi: 10.1371/journal.pcbi.1010974. eCollection 2023 Aug.

DOI:10.1371/journal.pcbi.1010974

PMID:37590332

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10464998/

Abstract

Proteolysis-targeting chimeras (PROTACs) are hetero-bifunctional molecules that induce the degradation of target proteins by recruiting an E3 ligase. PROTACs have the potential to inactivate disease-related genes that are considered undruggable by small molecules, making them a promising therapy for the treatment of incurable diseases. However, only a few hundred proteins have been experimentally tested for their amenability to PROTACs, and it remains unclear which other proteins in the entire human genome can be targeted by PROTACs. In this study, we have developed PrePROTAC, an interpretable machine learning model based on a transformer-based protein sequence descriptor and random forest classification. PrePROTAC predicts genome-wide targets that can be degraded by CRBN, one of the E3 ligases. In the benchmark studies, PrePROTAC achieved a ROC-AUC of 0.81, an average precision of 0.84, and over 40% sensitivity at a false positive rate of 0.05. When evaluated by an external test set which comprised proteins from different structural folds than those in the training set, the performance of PrePROTAC did not drop significantly, indicating its generalizability. Furthermore, we developed an embedding SHapley Additive exPlanations (eSHAP) method, which extends conventional SHAP analysis for original features to an embedding space through in silico mutagenesis. This method allowed us to identify key residues in the protein structure that play critical roles in PROTAC activity. The identified key residues were consistent with existing knowledge. Using PrePROTAC, we identified over 600 novel understudied proteins that are potentially degradable by CRBN and proposed PROTAC compounds for three novel drug targets associated with Alzheimer's disease.

摘要

蛋白水解靶向嵌合体（PROTACs）是一种异双功能分子，通过招募 E3 连接酶诱导靶蛋白降解。PROTACs 有可能使小分子认为不可成药的疾病相关基因失活，使其成为治疗不治之症的有前途的疗法。然而，只有几百种蛋白质已经过实验测试，以确定它们是否适合 PROTACs，并且仍然不清楚整个人类基因组中的哪些其他蛋白质可以被 PROTACs 靶向。在这项研究中，我们开发了 PrePROTAC，这是一种基于基于变压器的蛋白质序列描述符和随机森林分类的可解释机器学习模型。PrePROTAC 预测了可以被 E3 连接酶之一 CRBN 降解的全基因组靶标。在基准研究中，PrePROTAC 达到了 ROC-AUC 为 0.81，平均精度为 0.84，假阳性率为 0.05 时灵敏度超过 40%。当通过包含与训练集中的蛋白质不同结构折叠的外部测试集进行评估时，PrePROTAC 的性能没有明显下降，表明其具有通用性。此外，我们开发了一种嵌入 SHapley Additive exPlanations（eSHAP）方法，该方法通过计算机诱变将原始特征的常规 SHAP 分析扩展到嵌入空间。该方法使我们能够确定蛋白质结构中的关键残基，这些残基在 PROTAC 活性中起着关键作用。鉴定出的关键残基与现有知识一致。使用 PrePROTAC，我们鉴定了 600 多种新的研究不足的潜在可被 CRBN 降解的蛋白质，并为三种与阿尔茨海默病相关的新型药物靶标提出了 PROTAC 化合物。

相似文献

Elucidation of genome-wide understudied proteins targeted by PROTAC-induced degradation using interpretable machine learning.使用可解释的机器学习阐明受 PROTAC 诱导降解作用靶向的全基因组未充分研究蛋白。

PLoS Comput Biol. 2023 Aug 17;19(8):e1010974. doi: 10.1371/journal.pcbi.1010974. eCollection 2023 Aug.

Elucidation of Genome-wide Understudied Proteins targeted by PROTAC-induced degradation using Interpretable Machine Learning.利用可解释机器学习阐明PROTAC诱导降解靶向的全基因组研究不足的蛋白质。

bioRxiv. 2023 Feb 24:2023.02.23.529828. doi: 10.1101/2023.02.23.529828.

PROTAC-DB 2.0: an updated database of PROTACs.PROTAC-DB 2.0：一个更新的 PROTAC 数据库。

Nucleic Acids Res. 2023 Jan 6;51(D1):D1367-D1372. doi: 10.1093/nar/gkac946.

Structural-based design of HD-TAC7 PROteolysis TArgeting chimeras (PROTACs) candidate transformations to abrogate SARS-CoV-2 infection.基于结构的HD-TAC7蛋白酶靶向嵌合体（PROTACs）候选转化体设计，以消除严重急性呼吸综合征冠状病毒2（SARS-CoV-2）感染。

J Biomol Struct Dyn. 2023;41(23):14566-14581. doi: 10.1080/07391102.2023.2183037. Epub 2023 Feb 25.

Lessons in PROTAC Design from Selective Degradation with a Promiscuous Warhead.从具有混杂弹头的选择性降解中吸取 PROTAC 设计的经验教训。

Cell Chem Biol. 2018 Jan 18;25(1):78-87.e5. doi: 10.1016/j.chembiol.2017.09.010. Epub 2017 Nov 9.

Disordered region of cereblon is required for efficient degradation by proteolysis-targeting chimera.cereblon 无序区域是通过蛋白水解靶向嵌合体进行有效降解所必需的。

Sci Rep. 2019 Dec 23;9(1):19654. doi: 10.1038/s41598-019-56177-5.

PRosettaC: Rosetta Based Modeling of PROTAC Mediated Ternary Complexes.PRosettaC：基于 Rosetta 的 PROTAC 介导的三元复合物建模。

J Chem Inf Model. 2020 Oct 26;60(10):4894-4903. doi: 10.1021/acs.jcim.0c00589. Epub 2020 Oct 6.

Discovery of E3 Ligase Ligands for Target Protein Degradation.E3 连接酶配体用于靶蛋白降解的发现。

Molecules. 2022 Oct 2;27(19):6515. doi: 10.3390/molecules27196515.

PROTACs: An Emerging Targeting Technique for Protein Degradation in Drug Discovery.PROTACs：药物发现中蛋白质降解的新兴靶向技术。

Bioessays. 2018 Apr;40(4):e1700247. doi: 10.1002/bies.201700247. Epub 2018 Feb 23.

E3 Ligase Ligands for PROTACs: How They Were Found and How to Discover New Ones.E3 连接酶配体用于 PROTACs：它们是如何被发现的，以及如何发现新的配体。

SLAS Discov. 2021 Apr;26(4):484-502. doi: 10.1177/2472555220965528. Epub 2020 Nov 3.

引用本文的文献

Accurate PROTAC-targeted degradation prediction with DegradeMaster.使用DegradeMaster进行准确的PROTAC靶向降解预测。

Bioinformatics. 2025 Jul 1;41(Supplement_1):i342-i351. doi: 10.1093/bioinformatics/btaf191.

Methods to accelerate PROTAC drug discovery.加速PROTAC药物发现的方法。

Biochem J. 2025 Jun 25;482(13):BCJ20243018. doi: 10.1042/BCJ20243018.

Targeted protein degradation: advances in drug discovery and clinical practice.靶向蛋白降解：药物发现和临床实践的进展。

Signal Transduct Target Ther. 2024 Nov 6;9(1):308. doi: 10.1038/s41392-024-02004-x.

Targeting glucocorticoid receptor signaling pathway for treatment of stress-related brain disorders.靶向糖皮质激素受体信号通路治疗与应激相关的脑疾病。

Pharmacol Rep. 2024 Dec;76(6):1333-1345. doi: 10.1007/s43440-024-00654-w. Epub 2024 Oct 3.

Targeting bacterial degradation machinery as an antibacterial strategy.靶向细菌降解机制作为一种抗菌策略。

Biochem J. 2023 Nov 15;480(21):1719-1731. doi: 10.1042/BCJ20230191.

本文引用的文献

PrePPI: A Structure Informed Proteome-wide Database of Protein-Protein Interactions.PrePPI：一个基于结构的蛋白质-蛋白质相互作用的蛋白质组学数据库。

J Mol Biol. 2023 Jul 15;435(14):168052. doi: 10.1016/j.jmb.2023.168052. Epub 2023 Mar 17.

D-SCRIPT translates genome to phenome with sequence-based, structure-aware, genome-scale predictions of protein-protein interactions.D-SCRIPT 通过基于序列、结构感知的基因组规模的蛋白质-蛋白质相互作用预测，将基因组转化为表型。

Cell Syst. 2021 Oct 20;12(10):969-982.e6. doi: 10.1016/j.cels.2021.08.010. Epub 2021 Oct 9.

Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。

Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.

Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.生物结构和功能源于将无监督学习扩展到 2.5 亿个蛋白质序列。

Proc Natl Acad Sci U S A. 2021 Apr 13;118(15). doi: 10.1073/pnas.2016239118.

Rationalizing PROTAC-Mediated Ternary Complex Formation Using Rosetta.利用 Rosetta 合理化 PROTAC 介导的三元复合物形成

J Chem Inf Model. 2021 Mar 22;61(3):1368-1382. doi: 10.1021/acs.jcim.0c01451. Epub 2021 Feb 24.

Mapping the Degradable Kinome Provides a Resource for Expedited Degrader Development.绘制可降解激酶组图谱为加快降解剂开发提供资源。

Cell. 2020 Dec 10;183(6):1714-1731.e10. doi: 10.1016/j.cell.2020.10.038. Epub 2020 Dec 3.

TCRD and Pharos 2021: mining the human proteome for disease biology.TCRD 和 Pharos 2021：从人类蛋白质组中挖掘疾病生物学。

Nucleic Acids Res. 2021 Jan 8;49(D1):D1334-D1346. doi: 10.1093/nar/gkaa993.

The interaction of DNA repair factors ASCC2 and ASCC3 is affected by somatic cancer mutations.DNA 修复因子 ASCC2 和 ASCC3 的相互作用受体细胞癌症突变的影响。

Nat Commun. 2020 Nov 2;11(1):5535. doi: 10.1038/s41467-020-19221-x.

PROTAC-DB: an online database of PROTACs.PROTAC-DB：一个 PROTAC 数据库。

Nucleic Acids Res. 2021 Jan 8;49(D1):D1381-D1387. doi: 10.1093/nar/gkaa807.

PRosettaC: Rosetta Based Modeling of PROTAC Mediated Ternary Complexes.PRosettaC：基于 Rosetta 的 PROTAC 介导的三元复合物建模。

J Chem Inf Model. 2020 Oct 26;60(10):4894-4903. doi: 10.1021/acs.jcim.0c00589. Epub 2020 Oct 6.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用可解释的机器学习阐明受 PROTAC 诱导降解作用靶向的全基因组未充分研究蛋白。

Elucidation of genome-wide understudied proteins targeted by PROTAC-induced degradation using interpretable machine learning.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献