Suppr超能文献

人工智能方法在“大数据”时代预测 hERG 通道抑制的批判性评估。

Critical Assessment of Artificial Intelligence Methods for Prediction of hERG Channel Inhibition in the "Big Data" Era.

机构信息

National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, Maryland 20850, United States.

出版信息

J Chem Inf Model. 2020 Dec 28;60(12):6007-6019. doi: 10.1021/acs.jcim.0c00884. Epub 2020 Dec 1.

Abstract

The rise of novel artificial intelligence (AI) methods necessitates their benchmarking against classical machine learning for a typical drug-discovery project. Inhibition of the potassium ion channel, whose alpha subunit is encoded by the human -related gene (hERG), leads to a prolonged QT interval of the cardiac action potential and is a significant safety pharmacology target for the development of new medicines. Several computational approaches have been employed to develop prediction models for the assessment of hERG liabilities of small molecules including recent work using deep learning methods. Here, we perform a comprehensive comparison of hERG effect prediction models based on classical approaches (random forests and gradient boosting) and modern AI methods [deep neural networks (DNNs) and recurrent neural networks (RNNs)]. The training set (∼9000 compounds) was compiled by integrating the hERG bioactivity data from the ChEMBL database with experimental data generated from an , high-throughput thallium flux assay. We utilized different molecular descriptors including the latent descriptors, which are real-value continuous vectors derived from chemical autoencoders trained on a large chemical space (>1.5 million compounds). The models were prospectively validated on ∼840 compounds screened in the same thallium flux assay. The best results were obtained with the XGBoost method and RDKit descriptors. The comparison of models based only on latent descriptors revealed that the DNNs performed significantly better than the classical methods. The RNNs that operate on SMILES provided the highest model sensitivity. The best models were merged into a consensus model that offered superior performance compared to reference models from academic and commercial domains. Furthermore, we shed light on the potential of AI methods to exploit the big data in chemistry and generate novel chemical representations useful in predictive modeling and tailoring a new chemical space.

摘要

新型人工智能 (AI) 方法的兴起需要将其与经典机器学习方法进行基准测试,以用于典型的药物发现项目。抑制钾离子通道,其 alpha 亚基由人类相关基因 (hERG) 编码,会导致心脏动作电位的 QT 间期延长,是开发新药的重要安全药理学靶标。已经采用了几种计算方法来开发用于评估小分子的 hERG 负债的预测模型,包括最近使用深度学习方法的工作。在这里,我们对基于经典方法(随机森林和梯度提升)和现代 AI 方法(深度神经网络 (DNN) 和递归神经网络 (RNN))的 hERG 效应预测模型进行了全面比较。训练集(约 9000 种化合物)通过将来自 ChEMBL 数据库的 hERG 生物活性数据与通过高通量铊通量测定法生成的实验数据相结合来编译。我们利用了不同的分子描述符,包括潜在描述符,这是从在 >150 万种化合物的大型化学空间上训练的化学自动编码器中得出的实值连续向量。模型在相同的铊通量测定法中筛选出的约 840 种化合物上进行了前瞻性验证。使用 XGBoost 方法和 RDKit 描述符获得了最佳结果。仅基于潜在描述符的模型比较表明,DNN 比经典方法表现更好。在 SMILES 上运行的 RNN 提供了最高的模型灵敏度。将最佳模型合并到共识模型中,与学术和商业领域的参考模型相比,该模型具有卓越的性能。此外,我们还探讨了 AI 方法在利用化学大数据并生成有用的预测建模和定制新化学空间的新型化学表示形式方面的潜力。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验