Suppr超能文献

通过多元线性回归(MLR)和支持向量机(SVM)对丙型肝炎病毒(HCV)NS3/4A蛋白酶抑制剂的生物活性进行定量构效关系(QSAR)研究。

QSAR studies of the bioactivity of hepatitis C virus (HCV) NS3/4A protease inhibitors by multiple linear regression (MLR) and support vector machine (SVM).

作者信息

Qin Zijian, Wang Maolin, Yan Aixia

机构信息

State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, P.O. Box 53, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, Beijing 100029, PR China.

State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, P.O. Box 53, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, Beijing 100029, PR China.

出版信息

Bioorg Med Chem Lett. 2017 Jul 1;27(13):2931-2938. doi: 10.1016/j.bmcl.2017.05.001. Epub 2017 May 3.

Abstract

In this study, quantitative structure-activity relationship (QSAR) models using various descriptor sets and training/test set selection methods were explored to predict the bioactivity of hepatitis C virus (HCV) NS3/4A protease inhibitors by using a multiple linear regression (MLR) and a support vector machine (SVM) method. 512 HCV NS3/4A protease inhibitors and their IC values which were determined by the same FRET assay were collected from the reported literature to build a dataset. All the inhibitors were represented with selected nine global and 12 2D property-weighted autocorrelation descriptors calculated from the program CORINA Symphony. The dataset was divided into a training set and a test set by a random and a Kohonen's self-organizing map (SOM) method. The correlation coefficients (r) of training sets and test sets were 0.75 and 0.72 for the best MLR model, 0.87 and 0.85 for the best SVM model, respectively. In addition, a series of sub-dataset models were also developed. The performances of all the best sub-dataset models were better than those of the whole dataset models. We believe that the combination of the best sub- and whole dataset SVM models can be used as reliable lead designing tools for new NS3/4A protease inhibitors scaffolds in a drug discovery pipeline.

摘要

在本研究中,探索了使用各种描述符集和训练/测试集选择方法的定量构效关系(QSAR)模型,通过多元线性回归(MLR)和支持向量机(SVM)方法预测丙型肝炎病毒(HCV)NS3/4A蛋白酶抑制剂的生物活性。从已发表的文献中收集了512种HCV NS3/4A蛋白酶抑制剂及其通过相同荧光共振能量转移(FRET)测定法测定的IC值,以构建一个数据集。所有抑制剂均由从CORINA Symphony程序计算得到的选定的9个全局描述符和12个二维性质加权自相关描述符表示。通过随机方法和科赫宁自组织映射(SOM)方法将数据集分为训练集和测试集。最佳MLR模型的训练集和测试集的相关系数(r)分别为0.75和0.72,最佳SVM模型的相关系数分别为0.87和0.85。此外,还开发了一系列子数据集模型。所有最佳子数据集模型的性能均优于整个数据集模型。我们认为,最佳子数据集和整个数据集SVM模型的组合可作为药物发现流程中新型NS3/4A蛋白酶抑制剂支架可靠的先导设计工具。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验