Lane Thomas R, Snyder Scott H, Harris Joshua S, Urbina Fabio, Ekins Sean
Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States.
ACS Chem Neurosci. 2025 Jun 4;16(11):2085-2099. doi: 10.1021/acschemneuro.5c00177. Epub 2025 May 14.
Central nervous system (CNS) drugs have the highest clinical attrition, often due to CNS-related toxicities such as drug-induced seizures (DIS). Early prediction of DIS risk could reduce failure rates and optimize drug development by prioritizing testing in experimental models of DIS. Using seizure-relevant Adverse Outcome Pathways (AOPs) from various sources, we identified 67 seizure-associated protein targets. Biological activity data (EC, IC, ) for these targets were curated from ChEMBL, enabling development of ∼2000 regression and classification (random forest, support vector, XGBoost) models. Support vector regression (SVR) models achieved an average MAE of 0.54 ± 0.09 (-log ), while random forest classifiers yielded mean ROC AUC, accuracy, and recall of 0.88, 0.85, and 0.70, respectively (5-fold CV) across all targets. Multitarget XGBoost models concatenating ECFP6 fingerprints and target encodings (one-hot or ProtBERT) also demonstrated excellent overall performance, although their predictive accuracy was notably lower for leave-out sets compared to individual target-specific models. These models were used to predict activity for a seizure-liability data set with target-annotated DIS risk predictions. Overall, our findings support the utility of using target-specific machine-learning models for DIS prediction to aid in early toxicity testing prioritization and reduce CNS drug attrition.
中枢神经系统(CNS)药物的临床淘汰率最高,这通常是由于与中枢神经系统相关的毒性,如药物性癫痫发作(DIS)。早期预测DIS风险可以通过在DIS实验模型中优先进行测试来降低失败率并优化药物开发。利用来自各种来源的与癫痫发作相关的不良结局途径(AOP),我们确定了67个与癫痫发作相关的蛋白质靶点。这些靶点的生物活性数据(EC、IC等)来自ChEMBL数据库,从而能够开发约2000个回归和分类(随机森林、支持向量、XGBoost)模型。支持向量回归(SVR)模型的平均平均绝对误差(MAE)为0.54±0.09(-log),而随机森林分类器在所有靶点上的平均受试者工作特征曲线下面积(ROC AUC)、准确率和召回率分别为0.88、0.85和0.70(五折交叉验证)。将ECFP6指纹和靶点编码(独热编码或ProtBERT)相结合的多靶点XGBoost模型也表现出了出色的整体性能,尽管与单个靶点特异性模型相比,其对留出集的预测准确率明显较低。这些模型被用于预测一个具有靶点注释的DIS风险预测的癫痫易感性数据集的活性。总体而言,我们的研究结果支持使用靶点特异性机器学习模型进行DIS预测,以帮助确定早期毒性测试的优先级并降低中枢神经系统药物的淘汰率。