Structural Bioinformatics Group , Charité - University Medicine Berlin , 10115 Berlin , Germany.
BB3R - Berlin Brandenburg 3R Graduate School , Freie Universität Berlin , 14195 Berlin , Germany.
J Chem Inf Model. 2018 Jun 25;58(6):1224-1233. doi: 10.1021/acs.jcim.8b00150. Epub 2018 May 30.
Drug-induced inhibition of the human ether-à-go-go-related gene (hERG)-encoded potassium ion channels can lead to fatal cardiotoxicity. Several marketed drugs and promising drug candidates were recalled because of this concern. Diverse modeling methods ranging from molecular similarity assessment to quantitative structure-activity relationship analysis employing machine learning techniques have been applied to data sets of varying size and composition (number of blockers and nonblockers). In this study, we highlight the challenges involved in the development of a robust classifier for predicting the hERG end point using bioactivity data extracted from the public domain. To this end, three different modeling methods, nearest neighbors, random forests, and support vector machines, were employed to develop predictive models using different molecular descriptors, activity thresholds, and training set compositions. Our models demonstrated superior performance in external validations in comparison with those reported in the previous studies from which the data sets were extracted. The choice of descriptors had little influence on the model performance, with minor exceptions. The criteria used to filter bioactivity data, the activity threshold settings used to separate blockers from nonblockers, and the structural diversity of blockers in training data set were found to be the crucial indicators of model performance. Training sets based on a binary threshold of 1 μM/10 μM to separate blockers (IC/ K ≤ 1 μM) from nonblockers (IC/ K > 10 μM) provided superior performance in comparison with those defined using a single threshold (1 μM or 10 μM). A major limitation in using the public domain hERG activity data is the abundance of blockers in comparison with nonblockers at usual activity thresholds, since not many studies report the latter.
药物抑制人 Ether-à-go-go 相关基因 (hERG) 编码的钾离子通道可导致致命的心脏毒性。由于这一担忧,一些已上市的药物和有前途的候选药物已被召回。从分子相似性评估到采用机器学习技术的定量构效关系分析等各种建模方法,已应用于不同大小和组成(阻滞剂和非阻滞剂数量)的数据集中。在这项研究中,我们强调了使用从公共领域提取的生物活性数据为 hERG 终点开发稳健分类器所涉及的挑战。为此,采用三种不同的建模方法,即最近邻法、随机森林法和支持向量机法,使用不同的分子描述符、活性阈值和训练集组成来开发预测模型。与从提取数据集的先前研究中报告的模型相比,我们的模型在外部验证中表现出更好的性能。描述符的选择对模型性能的影响很小,除了一些例外。用于过滤生物活性数据的标准、用于将阻滞剂与非阻滞剂分开的活性阈值设置以及训练数据集中阻滞剂的结构多样性被发现是模型性能的关键指标。基于将阻滞剂(IC/K≤1μM)与非阻滞剂(IC/K>10μM)分开的 1μM/10μM 二进制阈值的训练集与使用单个阈值(1μM 或 10μM)相比,提供了更好的性能。在使用公共领域 hERG 活性数据时的一个主要限制是,在通常的活性阈值下,与非阻滞剂相比,阻滞剂的数量较多,因为许多研究并未报告后者。