Stoyanova-Slavova Iva B, Slavov Svetoslav H, Buzatu Dan A, Beger Richard D, Wilkes Jon G
Division of Systems Biology, National Center for Toxicological Research, 3900 NCTR Road, Jefferson, AR 72079, USA.
Division of Systems Biology, National Center for Toxicological Research, 3900 NCTR Road, Jefferson, AR 72079, USA.
J Mol Graph Model. 2017 Mar;72:246-255. doi: 10.1016/j.jmgm.2017.01.012. Epub 2017 Jan 16.
A dataset of 237 human Ether-à-go-go Related Gene (hERG) potassium channel inhibitors (180 of which were used for model building and validation, whereas 57 constituted the "true" external prediction set) collected from 22 literature sources was modeled by 3D-SDAR. To produce reliable and reproducible classification models for hERG blocking, the initial set of 180 chemicals was split into two subsets: a balanced modeling set consisting of 118 compounds and an unbalanced validation set comprised of 62 compounds. A PLS bagging-like algorithm written in Matlab was used to process the data and assign each compound to one of the two (hERG+ or hERG-) activity classes. The best predictive model evaluated on the basis of a fully randomized hold-out test set (comprising 20% of the modeling set) used 4 latent variables and a grid of 6ppm×6ppm×1Å in the C-C region, 6ppm×30ppm×1Å in the C-N region, and 30ppm×30ppm×1Å in the N-N region. An overall accuracy of 0.84 was obtained for both the hold-out test set and the validation set. Further, an external prediction set consisting of 57 drugs and drug derivatives was used to estimate the true predictive power of the reported 3D-SDAR model - a slight reduction of the overall accuracy down to 0.77 was observed. 3D-SDAR map of the most frequently occurring bins and their projection on the standard coordinate space of the chemical structures allowed identification of a three-center toxicophore composed of two aromatic rings and an amino group. A U test along the distance axis of the most frequently occurring 3D-SDAR bins was used to set the distance limits of the toxicophore. This toxicophore was found to be similar to an earlier reported phospholipidosis (PLD) toxicophore.
从22篇文献来源收集的237种人类醚 - 去极化相关基因(hERG)钾通道抑制剂数据集(其中180种用于模型构建和验证,57种构成“真实”外部预测集)采用3D - SDAR建模。为了生成可靠且可重复的hERG阻断分类模型,最初的180种化学物质被分为两个子集:一个由118种化合物组成的平衡建模集和一个由62种化合物组成的不平衡验证集。使用Matlab编写的类似PLS装袋算法处理数据,并将每种化合物分配到两个(hERG +或hERG -)活性类别之一。基于完全随机留出测试集(占建模集的20%)评估的最佳预测模型使用4个潜变量,在C - C区域为6ppm×6ppm×1Å的网格,在C - N区域为6ppm×30ppm×1Å,在N - N区域为30ppm×30ppm×1Å。留出测试集和验证集的总体准确率均为0.84。此外,使用由57种药物和药物衍生物组成的外部预测集来估计所报道的3D - SDAR模型的真实预测能力 - 观察到总体准确率略有下降至0.77。最常出现的箱的3D - SDAR图及其在化学结构标准坐标空间上的投影允许识别由两个芳香环和一个氨基组成的三中心毒效基团。沿着最常出现的3D - SDAR箱的距离轴进行U检验以设置毒效基团的距离限制。发现该毒效基团与先前报道的磷脂沉积症(PLD)毒效基团相似。