Suppr超能文献

使用可解释的机器学习阐明受 PROTAC 诱导降解作用靶向的全基因组未充分研究蛋白。

Elucidation of genome-wide understudied proteins targeted by PROTAC-induced degradation using interpretable machine learning.

机构信息

Department of Computer Science, Hunter College, The City University of New York, New York City, New York, United States of America.

Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York City, New York, United States of America.

出版信息

PLoS Comput Biol. 2023 Aug 17;19(8):e1010974. doi: 10.1371/journal.pcbi.1010974. eCollection 2023 Aug.

Abstract

Proteolysis-targeting chimeras (PROTACs) are hetero-bifunctional molecules that induce the degradation of target proteins by recruiting an E3 ligase. PROTACs have the potential to inactivate disease-related genes that are considered undruggable by small molecules, making them a promising therapy for the treatment of incurable diseases. However, only a few hundred proteins have been experimentally tested for their amenability to PROTACs, and it remains unclear which other proteins in the entire human genome can be targeted by PROTACs. In this study, we have developed PrePROTAC, an interpretable machine learning model based on a transformer-based protein sequence descriptor and random forest classification. PrePROTAC predicts genome-wide targets that can be degraded by CRBN, one of the E3 ligases. In the benchmark studies, PrePROTAC achieved a ROC-AUC of 0.81, an average precision of 0.84, and over 40% sensitivity at a false positive rate of 0.05. When evaluated by an external test set which comprised proteins from different structural folds than those in the training set, the performance of PrePROTAC did not drop significantly, indicating its generalizability. Furthermore, we developed an embedding SHapley Additive exPlanations (eSHAP) method, which extends conventional SHAP analysis for original features to an embedding space through in silico mutagenesis. This method allowed us to identify key residues in the protein structure that play critical roles in PROTAC activity. The identified key residues were consistent with existing knowledge. Using PrePROTAC, we identified over 600 novel understudied proteins that are potentially degradable by CRBN and proposed PROTAC compounds for three novel drug targets associated with Alzheimer's disease.

摘要

蛋白水解靶向嵌合体(PROTACs)是一种异双功能分子,通过招募 E3 连接酶诱导靶蛋白降解。PROTACs 有可能使小分子认为不可成药的疾病相关基因失活,使其成为治疗不治之症的有前途的疗法。然而,只有几百种蛋白质已经过实验测试,以确定它们是否适合 PROTACs,并且仍然不清楚整个人类基因组中的哪些其他蛋白质可以被 PROTACs 靶向。在这项研究中,我们开发了 PrePROTAC,这是一种基于基于变压器的蛋白质序列描述符和随机森林分类的可解释机器学习模型。PrePROTAC 预测了可以被 E3 连接酶之一 CRBN 降解的全基因组靶标。在基准研究中,PrePROTAC 达到了 ROC-AUC 为 0.81,平均精度为 0.84,假阳性率为 0.05 时灵敏度超过 40%。当通过包含与训练集中的蛋白质不同结构折叠的外部测试集进行评估时,PrePROTAC 的性能没有明显下降,表明其具有通用性。此外,我们开发了一种嵌入 SHapley Additive exPlanations(eSHAP)方法,该方法通过计算机诱变将原始特征的常规 SHAP 分析扩展到嵌入空间。该方法使我们能够确定蛋白质结构中的关键残基,这些残基在 PROTAC 活性中起着关键作用。鉴定出的关键残基与现有知识一致。使用 PrePROTAC,我们鉴定了 600 多种新的研究不足的潜在可被 CRBN 降解的蛋白质,并为三种与阿尔茨海默病相关的新型药物靶标提出了 PROTAC 化合物。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验