Tran-Nguyen Viet-Khoa, Randriharimanamizara Ulrick Fineddie, Taboureau Olivier
Université Paris Cité, CNRS UMR 8251, INSERM ERL 1133, 75013, Paris, France.
J Cheminform. 2025 Jul 24;17(1):110. doi: 10.1186/s13321-025-01063-8.
The human Ether-à-go-go-Related Gene (hERG) potassium channel is crucial for repolarizing the cardiac action potential and regulating the heartbeat. Molecules that inhibit this protein can cause acquired long QT syndrome, increasing the risk of arrhythmias and sudden fatal cardiac arrests. Detecting compounds with potential hERG inhibitory activity is therefore essential to mitigate cardiotoxicity risks. In this article, we present a new hERG data set of unprecedented size, comprising nearly 300,000 molecules reported in PubChem and ChEMBL, approximately 2000 of which were confirmed hERG blockers identified through in vitro assays. Multiple structure-based artificial intelligence (AI) binary classifiers for predicting hERG inhibitors were developed, employing, as descriptors, protein-ligand extended connectivity (PLEC) fingerprints fed into random forest, extreme gradient boosting, and deep neural network (DNN) algorithms. Our best-performing model, a stacking ensemble classifier with a DNN meta-learner, achieved state-of-the-art classification performance, accurately identifying 86% of molecules having half-maximal inhibitory concentrations (ICs) not exceeding 20 µM in our challenging test set, including 94% of hERG blockers whose ICs were not greater than 1 µM. It also demonstrated superior screening power compared to virtual screening schemes that used existing scoring functions. This model, named "HERGAI," along with relevant input/output data and user-friendly source code, is available in our GitHub repository ( https://github.com/vktrannguyen/HERGAI ) and can be used to predict drug-induced hERG blockade, even on large data sets.
人类醚-à-去-去相关基因(hERG)钾通道对于心脏动作电位的复极化和心跳调节至关重要。抑制该蛋白的分子可导致获得性长QT综合征,增加心律失常和心脏猝死的风险。因此,检测具有潜在hERG抑制活性的化合物对于降低心脏毒性风险至关重要。在本文中,我们展示了一个规模空前的新hERG数据集,其中包含在PubChem和ChEMBL中报告的近30万个分子,其中约2000个是通过体外试验确认的hERG阻滞剂。我们开发了多个基于结构的人工智能(AI)二元分类器来预测hERG抑制剂,使用蛋白质-配体扩展连接性(PLEC)指纹作为描述符,输入随机森林、极端梯度提升和深度神经网络(DNN)算法。我们表现最佳的模型是一个带有DNN元学习器的堆叠集成分类器,达到了当前的先进分类性能,在我们具有挑战性的测试集中准确识别出86%的半数最大抑制浓度(IC)不超过20 μM的分子,包括94%的IC不大于1 μM的hERG阻滞剂。与使用现有评分函数的虚拟筛选方案相比,它还展示了卓越的筛选能力。这个名为“HERGAI”的模型,连同相关的输入/输出数据和用户友好的源代码,可在我们的GitHub仓库(https://github.com/vktrannguyen/HERGAI )中获取,并且可用于预测药物诱导的hERG阻滞,即使是在大数据集上。