Suppr超能文献

半监督学习通过挖掘大型未标记小分子数据集来提高 hERG、Nav1.5 和 Cav1.2 心脏离子通道毒性预测。

Semisupervised Learning to Boost hERG, Nav1.5, and Cav1.2 Cardiac Ion Channel Toxicity Prediction by Mining a Large Unlabeled Small Molecule Data Set.

机构信息

Department of Computer Science, University of Antwerp, 2020 Antwerp, Belgium.

Biomedical Informatics Network Antwerpen (biomina), 2020 Antwerp, Belgium.

出版信息

J Chem Inf Model. 2024 Aug 26;64(16):6410-6420. doi: 10.1021/acs.jcim.4c01102. Epub 2024 Aug 7.

Abstract

Predicting drug toxicity is a critical aspect of ensuring patient safety during the drug design process. Although conventional machine learning techniques have shown some success in this field, the scarcity of annotated toxicity data poses a significant challenge in enhancing models' performance. In this study, we explore the potential of leveraging large unlabeled small molecule data sets using semisupervised learning to improve drug cardiotoxicity predictive performance across three cardiac ion channel targets: the voltage-gated potassium channel (hERG), the voltage-gated sodium channel (Nav1.5), and the voltage-gated calcium channel (Cav1.2). We extensively mined the ChEMBL database, comprising approximately 2 million small molecules, and then employed semisupervised learning to construct robust classification models for this purpose. We achieved a performance boost on highly diverse (i.e., structurally dissimilar) test data sets across all three targets. Using our built models, we screened the whole ChEMBL database and a large set of FDA-approved drugs, identifying several compounds with potential cardiac ion channel activity. To ensure broad accessibility and usability for both technical and nontechnical users, we developed a cross-platform graphical user interface that allows users to make predictions and gain insights into the cardiotoxicity of drugs and other small molecules. The software is made available as open source under the permissive MIT license at https://github.com/issararab/CToxPred2.

摘要

预测药物毒性是确保药物设计过程中患者安全的关键环节。尽管传统的机器学习技术在该领域已经取得了一定的成功,但标注毒性数据的稀缺性仍然是提高模型性能的重大挑战。在本研究中,我们探索了利用半监督学习利用大型未标注小分子数据集的潜力,以提高三种心脏离子通道靶点(电压门控钾通道(hERG)、电压门控钠通道(Nav1.5)和电压门控钙通道(Cav1.2))的药物心脏毒性预测性能。我们广泛挖掘了包含大约 200 万个小分子的 ChEMBL 数据库,然后采用半监督学习为此目的构建了稳健的分类模型。我们在所有三个靶点的高度多样化(即结构不同)的测试数据集上实现了性能提升。使用我们构建的模型,我们筛选了整个 ChEMBL 数据库和一组大型 FDA 批准药物,确定了一些具有潜在心脏离子通道活性的化合物。为了确保技术和非技术用户都能广泛访问和使用,我们开发了一个跨平台的图形用户界面,允许用户进行预测,并深入了解药物和其他小分子的心脏毒性。该软件作为开源软件,在宽松的 MIT 许可证下在 https://github.com/issararab/CToxPred2 上提供。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验