Suppr超能文献

iDrug-Target:通过基准数据集优化方法预测药物化合物与细胞网络中靶蛋白的相互作用。

iDrug-Target: predicting the interactions between drug compounds and target proteins in cellular networking via benchmark dataset optimization approach.

机构信息

a Computer Department , Jing-De-Zhen Ceramic Institute , Jing-De-Zhen 333046 , China.

出版信息

J Biomol Struct Dyn. 2015;33(10):2221-33. doi: 10.1080/07391102.2014.998710. Epub 2015 Jan 14.

Abstract

Information about the interactions of drug compounds with proteins in cellular networking is very important for drug development. Unfortunately, all the existing predictors for identifying drug-protein interactions were trained by a skewed benchmark data-set where the number of non-interactive drug-protein pairs is overwhelmingly larger than that of the interactive ones. Using this kind of highly unbalanced benchmark data-set to train predictors would lead to the outcome that many interactive drug-protein pairs might be mispredicted as non-interactive. Since the minority interactive pairs often contain the most important information for drug design, it is necessary to minimize this kind of misprediction. In this study, we adopted the neighborhood cleaning rule and synthetic minority over-sampling technique to treat the skewed benchmark datasets and balance the positive and negative subsets. The new benchmark datasets thus obtained are called the optimized benchmark datasets, based on which a new predictor called iDrug-Target was developed that contains four sub-predictors: iDrug-GPCR, iDrug-Chl, iDrug-Ezy, and iDrug-NR, specialized for identifying the interactions of drug compounds with GPCRs (G-protein-coupled receptors), ion channels, enzymes, and NR (nuclear receptors), respectively. Rigorous cross-validations on a set of experiment-confirmed datasets have indicated that these new predictors remarkably outperformed the existing ones for the same purpose. To maximize users' convenience, a public accessible Web server for iDrug-Target has been established at http://www.jci-bioinfo.cn/iDrug-Target/ , by which users can easily get their desired results. It has not escaped our notice that the aforementioned strategy can be widely used in many other areas as well.

摘要

关于药物化合物与细胞网络中蛋白质相互作用的信息对于药物开发非常重要。不幸的是,所有现有的用于识别药物-蛋白质相互作用的预测器都是通过偏倚的基准数据集进行训练的,其中非相互作用的药物-蛋白质对的数量远远超过相互作用的药物-蛋白质对。使用这种高度不平衡的基准数据集来训练预测器会导致许多相互作用的药物-蛋白质对可能被错误地预测为非相互作用。由于少数相互作用的对通常包含药物设计最重要的信息,因此有必要最小化这种错误预测。在这项研究中,我们采用了邻域清理规则和合成少数过采样技术来处理偏倚的基准数据集,并平衡正负子集。由此获得的新基准数据集称为优化基准数据集,基于此开发了一个新的预测器,称为 iDrug-Target,包含四个子预测器:iDrug-GPCR、iDrug-Chl、iDrug-Ezy 和 iDrug-NR,分别专门用于识别药物化合物与 GPCR(G 蛋白偶联受体)、离子通道、酶和 NR(核受体)的相互作用。在一组经过实验验证的数据集上进行的严格交叉验证表明,这些新的预测器在相同的目的上明显优于现有的预测器。为了最大限度地方便用户,我们在 http://www.jci-bioinfo.cn/iDrug-Target/ 上建立了一个可公开访问的 iDrug-Target 公共服务器,用户可以通过该服务器轻松获得所需的结果。我们注意到,上述策略也可以广泛应用于许多其他领域。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验