Sahu Sunil, Anmol Adarsh, Nishad Tushar, Jujjavarapu Satya Eswari
Department of Biotechnology, National Institute of Technology Raipur, Raipur, Chhattisgarh, 492001, India.
Mol Divers. 2025 Sep 10. doi: 10.1007/s11030-025-11287-3.
Traditional drug discovery methods like high-throughput screening and molecular docking are slow and costly. This study introduces a machine learning framework to predict bioactivity (pIC₅₀) and identify key molecular properties and structural features for targeting Trypanothione reductase (TR), Protein kinase C theta (PKC-θ), and Cannabinoid receptor 1 (CB1) using data from the ChEMBL database. Molecular fingerprints, generated via PaDEL-Descriptor and RDKit, encoded structural features as binary vectors. Three models-Random Forest (RF), Gradient Boosting (GB), and a stacking ensemble with Ridge Regression-predicted pIC₅₀, with the ensemble achieving the lowest RMSE. Results highlight heteroatom-containing rings for TR, multiple ring systems for PKC-θ, and aromatic rings for CB1 as critical for high bioactivity. This adaptable framework accelerates drug design by pinpointing optimizable structures, enhancing efficiency in therapeutic development.
像高通量筛选和分子对接这样的传统药物发现方法既缓慢又昂贵。本研究引入了一个机器学习框架,利用ChEMBL数据库的数据来预测生物活性(pIC₅₀),并确定针对锥虫硫醇还原酶(TR)、蛋白激酶Cθ(PKC-θ)和大麻素受体1(CB1)的关键分子特性和结构特征。通过PaDEL-Descriptor和RDKit生成的分子指纹将结构特征编码为二进制向量。三种模型——随机森林(RF)、梯度提升(GB)以及带有岭回归的堆叠集成模型——预测pIC₅₀,其中集成模型实现了最低的均方根误差(RMSE)。结果突出显示,含杂原子的环对TR至关重要,多个环系统对PKC-θ至关重要,而芳香环对CB1具有高生物活性至关重要。这个适应性强的框架通过精确确定可优化的结构来加速药物设计,提高治疗开发的效率。