Yu Tianshi, Nantasenamat Chanin, Kachenton Supicha, Anuwongcharoen Nuttapat, Piacham Theeraphon
Center of Data Mining and Biomedical informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
Streamlit Open Source, Snowflake Inc., San Mateo, California 94402, United States.
ACS Omega. 2023 Feb 13;8(7):6729-6742. doi: 10.1021/acsomega.2c07346. eCollection 2023 Feb 21.
Prostate cancer (PCa) is a major leading cause of mortality of cancer among males. There have been numerous studies to develop antagonists against androgen receptor (AR), a crucial therapeutic target for PCa. This study is a systematic cheminformatic analysis and machine learning modeling to study the chemical space, scaffolds, structure-activity relationship, and landscape of human AR antagonists. There are 1678 molecules as final data sets. Chemical space visualization by physicochemical property visualization has demonstrated that molecules from the potent/active class generally have a mildly smaller molecular weight (MW), octanol-water partition coefficient (log ), number of hydrogen-bond acceptors (nHA), number of rotatable bonds (nRot), and topological polar surface area (TPSA) than molecules from intermediate/inactive class. The chemical space visualization in the principal component analysis (PCA) plot shows significant overlapping distributions between potent/active class molecules and intermediate/inactive class molecules; potent/active class molecules are intensively distributed, while intermediate/inactive class molecules are widely and sparsely distributed. Murcko scaffold analysis has shown low scaffold diversity in general, and scaffold diversity of potent/active class molecules is even lower than intermediate/inactive class molecules, indicating the necessity for developing molecules with novel scaffolds. Furthermore, scaffold visualization has identified 16 representative Murcko scaffolds. Among them, scaffolds 1, 2, 3, 4, 7, 8, 10, 11, 15, and 16 are highly favorable scaffolds due to their high scaffold enrichment factor values. Based on scaffold analysis, their local structure-activity relationships (SARs) were investigated and summarized. In addition, the global SAR landscape was explored by quantitative structure-activity relationship (QSAR) modelings and structure-activity landscape visualization. A QSAR classification model incorporating all of the 1678 molecules stands out as the best model from a total of 12 candidate models for AR antagonists (built on PubChem fingerprint, extra trees algorithm, accuracy for training set: 0.935, 10-fold cross-validation set: 0.735 and test set: 0.756). Deeper insights into the structure-activity landscape highlighted a total of seven significant activity cliff (AC) generators (ChEMBL molecule IDs: 160257, 418198, 4082265, 348918, 390728, 4080698, and 6530), which provide valuable SAR information for medicinal chemistry. The findings in this study provide new insights and guidelines for hit identification and lead optimization for the development of novel AR antagonists.
前列腺癌(PCa)是男性癌症死亡的主要原因之一。已经有许多研究致力于开发针对雄激素受体(AR)的拮抗剂,AR是PCa的关键治疗靶点。本研究是一项系统的化学信息学分析和机器学习建模,旨在研究人类AR拮抗剂的化学空间、骨架、构效关系和格局。最终数据集包含1678个分子。通过物理化学性质可视化进行的化学空间可视化表明,与中等活性/无活性类分子相比,强效/活性类分子通常具有略小的分子量(MW)、辛醇-水分配系数(log )、氢键受体数量(nHA)、可旋转键数量(nRot)和拓扑极性表面积(TPSA)。主成分分析(PCA)图中的化学空间可视化显示,强效/活性类分子和中等活性/无活性类分子之间存在明显的重叠分布;强效/活性类分子集中分布,而中等活性/无活性类分子分布广泛且稀疏。Murcko骨架分析总体显示出较低的骨架多样性,强效/活性类分子的骨架多样性甚至低于中等活性/无活性类分子,这表明开发具有新型骨架的分子的必要性。此外,骨架可视化确定了16个代表性的Murcko骨架。其中,骨架1、2、3、4、7、8、10、11、15和16因其高骨架富集因子值而成为高度有利的骨架。基于骨架分析,研究并总结了它们的局部构效关系(SAR)。此外,通过定量构效关系(QSAR)建模和构效格局可视化探索了全局SAR格局。在总共12个AR拮抗剂候选模型(基于PubChem指纹、极端随机树算法构建,训练集准确率:0.935,10倍交叉验证集:0.735,测试集:0.756)中,一个包含所有1678个分子的QSAR分类模型脱颖而出,成为最佳模型。对构效格局的更深入洞察突出了总共七个显著的活性悬崖(AC)生成器(ChEMBL分子ID:160257、418198、4082265、348918、390728、4080698和6530),它们为药物化学提供了有价值的SAR信息。本研究的结果为新型AR拮抗剂的先导化合物发现和先导优化提供了新的见解和指导。