Falcón-Cano Gabriela, Morales-Helguera Aliuska, Lambert Heather, Cabrera-Pérez Miguel-Ángel, Molina Christophe
PIKAÏROS, S.A, 31650, Saint Orens de Gameville, France.
Departamento de Ciencias Farmacéuticas, Facultad de Ciencias, Universidad Católica del Norte, Angamos, 0610, Antofagasta, Chile.
Sci Rep. 2025 May 4;15(1):15585. doi: 10.1038/s41598-025-99766-3.
Blockade of the human Ether-à-go-go Related Gene (hERG) potassium channel by small molecules can prolong the QT interval, leading to fatal cardiotoxicity. Numerous drugs have been withdrawn from the market due to cardiac side effects, underscoring the need for early identification of hERG toxicity. Despite several classification machine learning (ML) models having been developed to this end, robustness, class imbalance, and interpretability are still challenges. Using the largest public database of hERG inhibition, this work integrates eXtreme Gradient Boosting (XGBoost) with Isometric Stratified Ensemble (ISE) mapping (XGB + ISE map) to enhance hERG prediction. An XGBoost consensus model was developed using balanced training sets and diverse variable subsets, resulting in robust models less affected by class imbalance. The model demonstrated competitive predictive performance, achieving a balance between sensitivity (SE = 0.83) and specificity (SP = 0.90) through exhaustive validation. ISE mapping estimated the model applicability domain and improved prediction confidence evaluation and compound selection by stratifying data. Refined variable selection procedures enhanced model interpretability. Variable importance analysis highlights key molecular determinants (peoe_VSA8, ESOL, SdssC, MaxssO, nRNR2, MATS1i, nRNHR, nRNH2) associated with hERG inhibition. The XGB + ISE map strategy provides an effective approach to identifying promising molecules in drug discovery campaigns with reduced hERG inhibition risk.
小分子对人类醚 - 去极化相关基因(hERG)钾通道的阻断可延长QT间期,导致致命的心脏毒性。许多药物因心脏副作用已从市场上撤出,这凸显了早期识别hERG毒性的必要性。尽管为此已经开发了几种分类机器学习(ML)模型,但稳健性、类不平衡和可解释性仍然是挑战。利用最大的hERG抑制公共数据库,这项工作将极端梯度提升(XGBoost)与等距分层集成(ISE)映射(XGB + ISE映射)相结合,以增强hERG预测。使用平衡训练集和不同变量子集开发了一个XGBoost共识模型,从而得到受类不平衡影响较小的稳健模型。该模型展示了具有竞争力的预测性能,通过详尽验证在敏感性(SE = 0.83)和特异性(SP = 0.90)之间实现了平衡。ISE映射估计了模型适用域,并通过对数据进行分层改进了预测置信度评估和化合物选择。精细的变量选择程序增强了模型的可解释性。变量重要性分析突出了与hERG抑制相关的关键分子决定因素(peoe_VSA8、ESOL、SdssC、MaxssO、nRNR2、MATS1i、nRNHR、nRNH2)。XGB + ISE映射策略提供了一种有效的方法,可在药物发现活动中识别有前景的分子,同时降低hERG抑制风险。