Suppr超能文献

基于广泛但存在偏倚的终点 hERG 数据集的全面支持向量机二进制 hERG 分类模型。

A comprehensive support vector machine binary hERG classification model based on extensive but biased end point hERG data sets.

机构信息

Department of Computer Science and Information Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Road Taipei, Taiwan 106.

出版信息

Chem Res Toxicol. 2011 Jun 20;24(6):934-49. doi: 10.1021/tx200099j. Epub 2011 May 6.

Abstract

The human ether-a-go-go related gene (hERG) potassium ion channel plays a key role in cardiotoxicity and is therefore a key target as part of preclinical drug discovery toxicity screening. The PubChem hERG Bioassay data set, composed of 1668 compounds, was used to construct an in silico screening model. The corresponding trial models were constructed from a descriptor pool composed of 4D fingerprints (4D-FP) and traditional 2D and 3D VolSurf-like molecular descriptors. A final binary classification model was constructed via a support vector machine (SVM). The resultant model was then validated using the PubChem hERG Bioassay data set (AID 376) and an external hERG data set by evaluating the model's ability to determine hERG blockers from nonblockers. The external data set (the test set) consisted of 356 compounds collected from available literature data and consisting of 287 actives and 69 inactives. Four different sampling protocols and a 10-fold cross-correlation analysis--used in the validation process to evaluate classification models--explored the impact of the active--inactive data imbalance distribution of the PubChem high-throughput data set. Four different data sets were explored, and the one employing Lipinski's rule-of-five coupled with measures of relative molecular lipophilicity performed the best in the 10-fold cross-correlation validation of the training data set as well as overall prediction accuracy of the external test sets. The linear SVM binary classification model building strategy was applied to different combinations of MOE (traditional 2D, "21/2D", and 3D VolSurf-like) and 4D-FP molecular descriptors to further explore and refine previously proposed key descriptors, identify new significant features that contribute to the prediction of hERG toxicity, and construct the optimal SVM binary classification model from a shrunken descriptor pool. The accuracy, sensitivity, and specificity of the best model determined from 10-fold cross-validation are 95, 90, and 96%, respectively; the overall accuracy is near 87% for the external set. The models constructed in this study demonstrate the following: (i) robustness based upon performance in accuracy across the structural diversity of the training set, (ii) ability to predict a compound's "predisposition" to block hERG ion channels, and (iii) define and illustrate structural features that can be overlaid onto the chemical structures to aid in the 3D structure-activity interpretation of the hERG blocking effect.

摘要

人 ether-a-go-go 相关基因 (hERG) 钾离子通道在心脏毒性中起着关键作用,因此是临床前药物发现毒性筛选的关键靶点之一。PubChem hERG 生物测定数据集由 1668 种化合物组成,用于构建计算筛选模型。相应的试验模型是由由 4D 指纹图谱(4D-FP)和传统的 2D 和 3D VolSurf 类分子描述符组成的描述符池构建的。通过支持向量机(SVM)构建最终的二进制分类模型。然后使用 PubChem hERG 生物测定数据集(AID 376)和外部 hERG 数据集来验证该模型,以评估该模型从非阻滞剂中确定 hERG 阻滞剂的能力。外部数据集(测试集)由从现有文献数据中收集的 356 种化合物组成,其中 287 种为活性化合物,69 种为非活性化合物。在验证过程中,使用了四种不同的采样方案和十折交叉相关分析来评估分类模型,探讨了 PubChem 高通量数据集的活性-非活性数据不平衡分布对模型的影响。探索了四种不同的数据集,结果表明,将 Lipinski 五规则与相对分子亲脂性的度量相结合的数据集在训练数据集的十折交叉验证以及外部测试集的整体预测准确性方面表现最佳。线性 SVM 二进制分类模型构建策略应用于 MOE(传统 2D、“21/2D”和 3D VolSurf 类)和 4D-FP 分子描述符的不同组合,以进一步探索和精炼先前提出的关键描述符,确定有助于预测 hERG 毒性的新显著特征,并从缩小的描述符池中构建最佳的 SVM 二进制分类模型。从十折交叉验证中确定的最佳模型的准确性、灵敏度和特异性分别为 95%、90%和 96%;对于外部数据集,整体准确性接近 87%。本研究构建的模型表明:(i)基于对训练集结构多样性的准确性表现的稳健性,(ii)预测化合物“倾向”阻断 hERG 离子通道的能力,以及(iii)定义和说明可叠加到化学结构上的结构特征,以帮助解释 hERG 阻断作用的 3D 结构-活性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验