M3S-GRPred：一种使用多步堆叠策略对糖皮质激素受体拮抗剂进行可解释预测的新型集成学习方法。

M3S-GRPred: a novel ensemble learning approach for the interpretable prediction of glucocorticoid receptor antagonists using a multi-step stacking strategy.

作者信息

Schaduangrat Nalini, Chuntakaruk Hathaichanok, Rungrotmongkol Thanyada, Mookdarsanit Pakpoom, Shoombuatong Watshara

机构信息

Faculty of Medical Technology, Center for Research Innovation and Biomedical Informatics, Mahidol University, Bangkok, 10700, Thailand.

Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok, 10330, Thailand.

出版信息

BMC Bioinformatics. 2025 Apr 30;26(1):117. doi: 10.1186/s12859-025-06132-1.

DOI:10.1186/s12859-025-06132-1

PMID:40307679

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12044944/

Abstract

Accelerating drug discovery for glucocorticoid receptor (GR)-related disorders, including innovative machine learning (ML)-based approaches, holds promise in advancing therapeutic development, optimizing treatment efficacy, and mitigating adverse effects. While experimental methods can accurately identify GR antagonists, they are often not cost-effective for large-scale drug discovery. Thus, computational approaches leveraging SMILES information for precise in silico identification of GR antagonists are crucial, enabling efficient and scalable drug discovery. Here, we develop a new ensemble learning approach using a multi-step stacking strategy (M3S), termed M3S-GRPred, aimed at rapidly and accurately discovering novel GR antagonists. To the best of our knowledge, M3S-GRPred is the first SMILES-based predictor designed to identify GR antagonists without the use of 3D structural information. In M3S-GRPred, we first constructed different balanced subsets using an under-sampling approach. Using these balanced subsets, we explored and evaluated heterogeneous base-classifiers trained with a variety of SMILES-based feature descriptors coupled with popular ML algorithms. Finally, M3S-GRPred was constructed by integrating probabilistic feature from the selected base-classifiers derived from a two-step feature selection technique. Our comparative experiments demonstrate that M3S-GRPred can precisely identify GR antagonists and effectively address the imbalanced dataset. Compared to traditional ML classifiers, M3S-GRPred attained superior performance in terms of both the training and independent test datasets. Additionally, M3S-GRPred was applied to identify potential GR antagonists among FDA-approved drugs confirmed through molecular docking, followed by detailed MD simulation studies for drug repurposing in Cushing's syndrome. We anticipate that M3S-GRPred will serve as an efficient screening tool for discovering novel GR antagonists from vast libraries of unknown compounds in a cost-effective manner.

摘要

加速针对糖皮质激素受体（GR）相关疾病的药物发现，包括基于创新机器学习（ML）的方法，有望推动治疗开发、优化治疗效果并减轻不良反应。虽然实验方法可以准确识别GR拮抗剂，但对于大规模药物发现而言，它们往往成本效益不高。因此，利用SMILES信息进行GR拮抗剂精确虚拟识别的计算方法至关重要，可实现高效且可扩展的药物发现。在此，我们开发了一种新的集成学习方法，采用多步堆叠策略（M3S），称为M3S-GRPred，旨在快速准确地发现新型GR拮抗剂。据我们所知，M3S-GRPred是首个基于SMILES设计的预测器，旨在在不使用3D结构信息的情况下识别GR拮抗剂。在M3S-GRPred中，我们首先使用欠采样方法构建不同的平衡子集。利用这些平衡子集，我们探索并评估了使用各种基于SMILES的特征描述符与流行的ML算法训练的异构基分类器培训。最后，通过集成来自两步特征选择技术衍生的选定基分类器的概率特征来构建M3S-GRPred。我们的比较实验表明，M3S-GRPred可以精确识别GR拮抗剂并有效解决不平衡数据集问题。与传统ML分类器相比，M3S-GRPred在训练数据集和独立测试数据集方面均表现出卓越性能。此外，M3S-GRPred被应用于在FDA批准的药物中识别潜在的GR拮抗剂，通过分子对接确认，随后进行详细的分子动力学模拟研究，以用于库欣综合征的药物重新利用。我们预计M3S-GRPred将成为一种高效的筛选工具，以经济高效的方式从大量未知化合物库中发现新型GR拮抗剂。