Zhang Ran, Wang Shihang, Wang Lin, Tian Siyuan, Tang Yilin, Bai Fang
Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai 201210, China.
Center for AI and Computational Biology (CAICB),Institute of Systems Medicine (ISM), Chinese Academy of Medical Sciences, Suzhou 215028, China.
J Chem Inf Model. 2025 Jul 14;65(13):6861-6873. doi: 10.1021/acs.jcim.5c00366. Epub 2025 Jun 25.
Proteolysis-targeting chimeras (PROTACs) have garnered significant attention in drug design due to their ability to induce the degradation of the target proteins via the ubiquitin-proteasome system. However, the synthesis of PROTACs remains a challenging process, requiring the consideration of factors such as chemical complexity and accessibility. With the rise of generative artificial intelligence, several PROTAC generation models have been introduced, but tools to evaluate the synthetic accessibility of these molecules remain underdeveloped. To address this gap, we propose a deep learning-based computational model named DeepPSA (Deep learning-based PROTAC Synthetic Accessibility) designed to predict the synthetic accessibility of PROTACs. DeepPSA offers a systematic, data-driven approach to assess the feasibility of PROTAC synthesis, providing an essential tool for the design and screening of novel compounds. DeepPSA is a graph-based model built on a graph neural network architecture, trained on an in-house dataset of 3644 PROTACs with experimental synthetic data. As the first model specifically focused on PROTAC synthetic accessibility, DeepPSA demonstrates impressive performance, achieving 92.9% prediction accuracy and an Area Under the Receiver Operating Characteristic curve (AUROC) of 0.963 on the test set, indicating its ability to capture key structural characteristics of PROTACs. Moreover, DeepPSA continues to demonstrate superior performance on the structure-based partitioned datasets, further validating its exceptional generalization ability and robustness. DeepPSA is available online at a web server (https://bailab.siais.shanghaitech.edu.cn/psa) and at GitHub repository (https://github.com/Zhang-Ran-0119/DeepPSA).
靶向蛋白降解嵌合体(PROTACs)因其能够通过泛素-蛋白酶体系统诱导靶蛋白降解而在药物设计中备受关注。然而,PROTACs的合成仍然是一个具有挑战性的过程,需要考虑化学复杂性和可及性等因素。随着生成式人工智能的兴起,已经引入了几种PROTAC生成模型,但评估这些分子合成可及性的工具仍未得到充分发展。为了填补这一空白,我们提出了一种基于深度学习的计算模型,名为DeepPSA(基于深度学习的PROTAC合成可及性),旨在预测PROTACs的合成可及性。DeepPSA提供了一种系统的、数据驱动的方法来评估PROTAC合成的可行性,为新型化合物的设计和筛选提供了一个重要工具。DeepPSA是一个基于图的模型,建立在图神经网络架构之上,在一个包含3644个具有实验合成数据的PROTACs的内部数据集上进行训练。作为第一个专门关注PROTAC合成可及性的模型,DeepPSA表现出令人印象深刻的性能,在测试集上实现了92.9%的预测准确率和0.963的受试者工作特征曲线下面积(AUROC),表明其能够捕捉PROTACs的关键结构特征。此外,DeepPSA在基于结构的分区数据集上继续表现出卓越的性能,进一步验证了其出色的泛化能力和稳健性。DeepPSA可在网络服务器(https://bailab.siais.shanghaitech.edu.cn/psa)和GitHub仓库(https://github.com/Zhang-Ran-0119/DeepPSA)上在线获取。