Department of Computer Science, University of Antwerp, 2020 Antwerp, Belgium.
Biomedical Informatics Network Antwerpen (biomina), 2020 Antwerp, Belgium.
J Chem Inf Model. 2024 Aug 26;64(16):6410-6420. doi: 10.1021/acs.jcim.4c01102. Epub 2024 Aug 7.
Predicting drug toxicity is a critical aspect of ensuring patient safety during the drug design process. Although conventional machine learning techniques have shown some success in this field, the scarcity of annotated toxicity data poses a significant challenge in enhancing models' performance. In this study, we explore the potential of leveraging large unlabeled small molecule data sets using semisupervised learning to improve drug cardiotoxicity predictive performance across three cardiac ion channel targets: the voltage-gated potassium channel (hERG), the voltage-gated sodium channel (Nav1.5), and the voltage-gated calcium channel (Cav1.2). We extensively mined the ChEMBL database, comprising approximately 2 million small molecules, and then employed semisupervised learning to construct robust classification models for this purpose. We achieved a performance boost on highly diverse (i.e., structurally dissimilar) test data sets across all three targets. Using our built models, we screened the whole ChEMBL database and a large set of FDA-approved drugs, identifying several compounds with potential cardiac ion channel activity. To ensure broad accessibility and usability for both technical and nontechnical users, we developed a cross-platform graphical user interface that allows users to make predictions and gain insights into the cardiotoxicity of drugs and other small molecules. The software is made available as open source under the permissive MIT license at https://github.com/issararab/CToxPred2.
预测药物毒性是确保药物设计过程中患者安全的关键环节。尽管传统的机器学习技术在该领域已经取得了一定的成功,但标注毒性数据的稀缺性仍然是提高模型性能的重大挑战。在本研究中,我们探索了利用半监督学习利用大型未标注小分子数据集的潜力,以提高三种心脏离子通道靶点(电压门控钾通道(hERG)、电压门控钠通道(Nav1.5)和电压门控钙通道(Cav1.2))的药物心脏毒性预测性能。我们广泛挖掘了包含大约 200 万个小分子的 ChEMBL 数据库,然后采用半监督学习为此目的构建了稳健的分类模型。我们在所有三个靶点的高度多样化(即结构不同)的测试数据集上实现了性能提升。使用我们构建的模型,我们筛选了整个 ChEMBL 数据库和一组大型 FDA 批准药物,确定了一些具有潜在心脏离子通道活性的化合物。为了确保技术和非技术用户都能广泛访问和使用,我们开发了一个跨平台的图形用户界面,允许用户进行预测,并深入了解药物和其他小分子的心脏毒性。该软件作为开源软件,在宽松的 MIT 许可证下在 https://github.com/issararab/CToxPred2 上提供。