• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

迈向生成更简单的定量构效关系模型:非线性多元回归与几种神经网络集成及一些相关方法

Toward generating simpler QSAR models: nonlinear multivariate regression versus several neural network ensembles and some related methods.

作者信息

Lucić Bono, Nadramija Damir, Basic Ivan, Trinajstić Nenad

机构信息

The Rugjer Bosković Institute, P.O. Box 180, HR-10002 Zagreb, Croatia.

出版信息

J Chem Inf Comput Sci. 2003 Jul-Aug;43(4):1094-102. doi: 10.1021/ci025636j.

DOI:10.1021/ci025636j
PMID:12870898
Abstract

In this study we want to test whether a simple modeling procedure used in the field of QSAR/QSPR can produce simple models that will be, at the same time, as accurate as robust Neural Network Ensemble (NNE) ones. We present results of application of two procedures for generating/selecting simple linear and nonlinear multiregression (MR) models: (1) method for selecting the best possible MR models (named as CROMRsel) and (2) Genetic Function Approximation (GFA) method from the Cerius2 program package. The obtained MR models are strictly compared with several NNE models. For the comparison we selected four QSAR data sets previously studied by NNE (Tetko et al. J. Chem. Inf. Comput. Sci. 1996, 36, 794-803. Kovalishyn et al. J. Chem. Inf. Comput. Sci. 1998, 38, 651-659.): (1) 51 benzodiazepine derivatives, (2) 37 carboquinone derivatives, (3) 74 pyrimidines, and (4) 31 antimycin analogues. These data sets were parameterized with 7, 6, 27, and 53 descriptors, respectively. Modeled properties were anti-pentylenetetrazole activity, antileukemic activity, inhibition constants to dihydrofolate reductase from MB1428 E. coli, and antifilarial activity, respectively. Nonlinearities were introduced into the MR models through 2-fold and/or 3-fold cross-products of initial (linear) descriptors. Then, using the CROMRsel and GFA programs (J. Chem. Inf. Comput. Sci. 1999, 39, 121-132) the sets of I (I < or = 8, in this paper) the best descriptors (according to the fit and leave-one-out correlation coefficients) were selected for multiregression models. Two classes of models were obtained: (1) linear or nonlinear MR models which were generated starting from the complete set of descriptors, and (2) nonlinear MR models which were generated starting from the same set of descriptors that was used in the NNE modeling. In addition, the descriptor selection method from CROMRsel was compared with the GFA method included in the QSAR module of the Cerius2 program. For each data set it has been found that the MR models have better cross-validated statistical parameters than the corresponding NNE models and that CROMRsel selects somewhat better MR models than the GFA method. MR models are also much simpler than NNEs, which is the important surprising fact, and, additionally, express calculated dependencies in a functional form. Moreover, MR models were shown to be better than all other models obtained by different methods on the same data sets ("old" multivariate regressions, functional-link-net models, back-propagation neural networks, genetic algorithm, and partial least squares models). This study also indicated that the robust NNE models cannot generate good models when applied on small data sets, suggesting that it is perhaps better to apply robust methods (like NNE ones) on larger data sets.

摘要

在本研究中,我们想要测试定量构效关系/定量结构性质关系(QSAR/QSPR)领域中使用的一种简单建模程序是否能够生成简单模型,这些模型同时具有与强大的神经网络集成(NNE)模型一样的准确性。我们展示了两种生成/选择简单线性和非线性多元回归(MR)模型的程序的应用结果:(1)选择最佳可能MR模型的方法(命名为CROMRsel)和(2)来自Cerius2程序包的遗传函数逼近(GFA)方法。将得到的MR模型与几个NNE模型进行严格比较。为了进行比较,我们选择了之前NNE研究过的四个QSAR数据集(Tetko等人,《化学信息与计算机科学杂志》,1996年,36卷,794 - 803页。Kovalishyn等人,《化学信息与计算机科学杂志》,1998年,38卷,651 - 659页):(1)51种苯二氮䓬衍生物,(2)37种卡波醌衍生物,(3)74种嘧啶,以及(4)31种抗霉素类似物。这些数据集分别用7、6、27和53个描述符进行参数化。建模的性质分别是抗戊四氮活性、抗白血病活性、对大肠杆菌MB1428二氢叶酸还原酶的抑制常数以及抗丝虫活性。通过初始(线性)描述符的2倍和/或3倍叉积将非线性引入到MR模型中。然后,使用CROMRsel和GFA程序(《化学信息与计算机科学杂志》,1999年,39卷,121 - 132页)为多元回归模型选择I(I≤8,本文中)个最佳描述符(根据拟合和留一法相关系数)。得到了两类模型:(1)从完整描述符集开始生成的线性或非线性MR模型,以及(2)从与NNE建模中使用的相同描述符集开始生成的非线性MR模型。此外,将CROMRsel中的描述符选择方法与Cerius2程序的QSAR模块中包含的GFA方法进行比较。对于每个数据集,已发现MR模型具有比相应NNE模型更好的交叉验证统计参数,并且CROMRsel选择的MR模型比GFA方法稍好。MR模型也比NNE模型简单得多,这是一个重要的惊人事实,并且此外,以函数形式表达计算出的相关性。此外,在相同数据集上,MR模型被证明比通过不同方法获得的所有其他模型(“旧”多元回归、函数链接网络模型、反向传播神经网络、遗传算法和偏最小二乘模型)都要好。这项研究还表明,强大的NNE模型应用于小数据集时不能生成良好的模型,这表明也许最好将强大的方法(如NNE方法)应用于更大的数据集。

相似文献

1
Toward generating simpler QSAR models: nonlinear multivariate regression versus several neural network ensembles and some related methods.迈向生成更简单的定量构效关系模型:非线性多元回归与几种神经网络集成及一些相关方法
J Chem Inf Comput Sci. 2003 Jul-Aug;43(4):1094-102. doi: 10.1021/ci025636j.
2
Nonlinear multivariate regression outperforms several concisely designed neural networks on three QSPR data sets.在三个定量构效关系(QSPR)数据集上,非线性多元回归的表现优于几个精心设计的神经网络。
J Chem Inf Comput Sci. 2000 Mar;40(2):403-13. doi: 10.1021/ci990061k.
3
Use of variable selection in modeling the secondary structural content of proteins from their composition of amino acid residues.在根据氨基酸残基组成对蛋白质二级结构含量进行建模时使用变量选择。
J Chem Inf Comput Sci. 2004 Jan-Feb;44(1):113-21. doi: 10.1021/ci034037p.
4
Predictive QSAR modeling of HIV reverse transcriptase inhibitor TIBO derivatives.HIV逆转录酶抑制剂替博(TIBO)衍生物的预测性定量构效关系建模
Eur J Med Chem. 2009 Apr;44(4):1509-24. doi: 10.1016/j.ejmech.2008.07.020. Epub 2008 Jul 24.
5
A comparison of methods for modeling quantitative structure-activity relationships.定量构效关系建模方法的比较
J Med Chem. 2004 Oct 21;47(22):5541-54. doi: 10.1021/jm0497141.
6
Anticancer activity of selected phenolic compounds: QSAR studies using ridge regression and neural networks.选定酚类化合物的抗癌活性:使用岭回归和神经网络的定量构效关系研究
Chem Biol Drug Des. 2007 Nov;70(5):424-36. doi: 10.1111/j.1747-0285.2007.00575.x.
7
QSAR modeling using chirality descriptors derived from molecular topology.使用源自分子拓扑结构的手性描述符进行定量构效关系建模。
J Chem Inf Comput Sci. 2003 Jan-Feb;43(1):144-54. doi: 10.1021/ci025516b.
8
Combinatorial QSAR modeling of P-glycoprotein substrates.P-糖蛋白底物的组合定量构效关系建模
J Chem Inf Model. 2006 May-Jun;46(3):1245-54. doi: 10.1021/ci0504317.
9
QSAR--how good is it in practice? Comparison of descriptor sets on an unbiased cross section of corporate data sets.定量构效关系——在实际应用中效果如何?企业数据集无偏横截面描述符集的比较。
J Chem Inf Model. 2006 Sep-Oct;46(5):1924-36. doi: 10.1021/ci050413p.
10
Genetic neural networks for quantitative structure-activity relationships: improvements and application of benzodiazepine affinity for benzodiazepine/GABAA receptors.用于定量构效关系的遗传神经网络:苯二氮䓬对苯二氮䓬/GABAA受体亲和力的改进与应用
J Med Chem. 1996 Dec 20;39(26):5246-56. doi: 10.1021/jm960536o.