Liu Miao, Zhang Li, Li Shimeng, Yang Tianzhou, Liu Lili, Zhao Jian, Liu Hongsheng
School of Life Science, Liaoning University, Shenyang, 110036, China.
School of Life Science, Liaoning University, Shenyang, 110036, China; Research Center for Computer Simulating and Information Processing of Bio-macromolecules of Shenyang, Liaoning University, Shenyang, 110036, China; Engineering Laboratory for Molecular Simulation and Designing of Drug Molecules of Liaoning, Liaoning University, Shenyang, 110036, China.
Toxicol Lett. 2020 Oct 10;332:88-96. doi: 10.1016/j.toxlet.2020.07.003. Epub 2020 Jul 3.
The human ether-a-go-go-related gene (hERG) encodes a tetrameric potassium channel called Kv11.1. This channel can be blocked by certain drugs, which leads to long QT syndrome, causing cardiotoxicity. This is a significant problem during drug development. Using computer models to predict compound cardiotoxicity during the early stages of drug design will help to solve this problem. In this study, we used a dataset of 1865 compounds exhibiting known hERG inhibitory activities as a training set. Thirty cardiotoxicity classification models were established using three machine learning algorithms based on molecular fingerprints and molecular descriptors. Through using these models as the base classifier, a new cardiotoxicity classification model with better predictive performance was developed using ensemble learning method. The accuracy of the best base classifier, which was generated using the XGBoost method with molecular descriptors, was 84.8 %, and the area under the receiver-operating characteristic curve (AUC) was 0.876 in the five fold cross-validation. However, all of the ensemble models that we developed had higher predictive performance than the base classifiers in the five fold cross-validation. The best predictive performance was achieved by the Ensemble-Top7 model, with accuracy of 84.9 % and AUC of 0.887. We also tested the ensemble model using external validation data and achieved accuracy of 85.0 % and AUC of 0.786. Furthermore, we identified several hERG-related substructures, which provide valuable information for designing drug candidates.
人类醚 - 去极化相关基因(hERG)编码一种名为Kv11.1的四聚体钾通道。该通道可被某些药物阻断,从而导致长QT综合征,引发心脏毒性。这在药物研发过程中是一个重大问题。在药物设计的早期阶段使用计算机模型预测化合物的心脏毒性将有助于解决这一问题。在本研究中,我们使用了一个包含1865种具有已知hERG抑制活性的化合物的数据集作为训练集。基于分子指纹和分子描述符,使用三种机器学习算法建立了30个心脏毒性分类模型。通过将这些模型用作基础分类器,采用集成学习方法开发了一个具有更好预测性能的新心脏毒性分类模型。使用带有分子描述符的XGBoost方法生成的最佳基础分类器在五折交叉验证中的准确率为84.8%,受试者工作特征曲线下面积(AUC)为0.876。然而,我们开发的所有集成模型在五折交叉验证中的预测性能都高于基础分类器。Ensemble - Top7模型实现了最佳预测性能,准确率为84.9%,AUC为0.887。我们还使用外部验证数据测试了该集成模型,准确率为85.0%,AUC为0.786。此外,我们识别出了几个与hERG相关的子结构,这为设计候选药物提供了有价值的信息。