• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一大组萜类化合物的科瓦茨保留指数的定量结构-保留关系:一种组合数据拆分-特征选择策略

Quantitative structure-retention relationship for the Kovats retention indices of a large set of terpenes: a combined data splitting-feature selection strategy.

作者信息

Hemmateenejad Bahram, Javadnia Katayoun, Elyasi Maryam

机构信息

Chemistry Department, Shiraz University, Shiraz 71454, Iran.

出版信息

Anal Chim Acta. 2007 May 29;592(1):72-81. doi: 10.1016/j.aca.2007.04.009. Epub 2007 Apr 8.

DOI:10.1016/j.aca.2007.04.009
PMID:17499073
Abstract

A data set consisting of a large number of terpenoids, the widely distributed compounds in nature that are found in abundance in higher plants, have been used to develop a quantitative structure property relationship (QSPR) for their Kovats retention index. QSPR models are usually obtained by splitting the data into two sets including calibration (or training) and prediction (or validation). All model building steps, especially feature selection procedure, are performed using this initial splitting, and therefore the performances of the resulted models are highly dependent on the initial data splitting. To investigate the effects of data splitting on the feature selection in the current article we proposed a combined data splitting-feature selection (CDFS) methodology for QSPR model development by producing several different training/validation/test sets, and repeating all of the model building studies. In this method, data splitting is achieved many times and in each case feature selection is performed. The resulted models are compared for similarity and dissimilarity between the selected descriptors. The final model is one whose descriptors are the common variables between all of resulted models. The method was applied to QSPR study of a large data set containing the Kovats retention indices of 573 terpenoids. A final 8-parametric multilinear model with constitutional and topological indices was obtained. Cross-validation indicated that the model could reproduce more than 90% of variances in the Kovats retention data. The relative error of prediction for an external test set of 50 compounds was 3.2%. Finally, to improve the results, structure-retention relationships were followed by nonlinear approach using artificial neural networks and consequently better results were obtained.

摘要

一组由大量萜类化合物组成的数据集被用于建立其科瓦茨保留指数的定量结构-性质关系(QSPR)模型,萜类化合物是自然界中广泛分布的化合物,在高等植物中大量存在。QSPR模型通常通过将数据分为两组来获得,这两组分别是校准(或训练)组和预测(或验证)组。所有的模型构建步骤,尤其是特征选择过程,都是基于这种初始划分来进行的,因此所得模型的性能高度依赖于初始数据划分。为了研究数据划分对当前文章中特征选择的影响,我们提出了一种用于QSPR模型开发的组合数据划分-特征选择(CDFS)方法,通过生成几个不同的训练/验证/测试集,并重复所有的模型构建研究。在这种方法中,多次进行数据划分,并且在每种情况下都进行特征选择。比较所得模型在所选描述符之间的相似性和差异性。最终模型是其描述符为所有所得模型之间的共同变量的模型。该方法被应用于对包含573种萜类化合物的科瓦茨保留指数的大数据集进行QSPR研究。获得了一个最终的包含结构和拓扑指数的8参数多线性模型。交叉验证表明该模型能够重现科瓦茨保留数据中超过90%的方差。50种化合物的外部测试集的预测相对误差为3.2%。最后,为了改进结果,采用人工神经网络的非线性方法研究结构-保留关系,从而获得了更好的结果。

相似文献

1
Quantitative structure-retention relationship for the Kovats retention indices of a large set of terpenes: a combined data splitting-feature selection strategy.一大组萜类化合物的科瓦茨保留指数的定量结构-保留关系:一种组合数据拆分-特征选择策略
Anal Chim Acta. 2007 May 29;592(1):72-81. doi: 10.1016/j.aca.2007.04.009. Epub 2007 Apr 8.
2
Linear and nonlinear quantitative structure-property relationship models for solubility of some anthraquinone, anthrone and xanthone derivatives in supercritical carbon dioxide.一些蒽醌、蒽酮和呫吨酮衍生物在超临界二氧化碳中溶解度的线性和非线性定量结构-性质关系模型
Anal Chim Acta. 2008 Mar 3;610(1):25-34. doi: 10.1016/j.aca.2008.01.011. Epub 2008 Jan 15.
3
QSPR models for half-wave reduction potential of steroids: a comparative study between feature selection and feature extraction from subsets of or entire set of descriptors.类固醇半波还原电位的定量构效关系(QSPR)模型:描述符子集或整个描述符集的特征选择与特征提取之间的比较研究
Anal Chim Acta. 2009 Feb 16;634(1):27-35. doi: 10.1016/j.aca.2008.11.062. Epub 2008 Dec 6.
4
QSPR modeling of soil sorption coefficients (K(OC)) of pesticides using SPA-ANN and SPA-MLR.使用逐步回归分析-人工神经网络(SPA-ANN)和逐步回归分析-多元线性回归(SPA-MLR)对农药土壤吸附系数(K(OC))进行定量结构-性质关系(QSPR)建模。
J Agric Food Chem. 2009 Aug 12;57(15):7153-8. doi: 10.1021/jf9008839.
5
Predictions of chromatographic retention indices of alkylphenols with support vector machines and multiple linear regression.采用支持向量机和多元线性回归预测烷基酚的色谱保留指数。
J Sep Sci. 2009 Dec;32(23-24):4133-42. doi: 10.1002/jssc.200900373.
6
Benchmarking of linear and nonlinear approaches for quantitative structure-property relationship studies of metal complexation with ionophores.用于离子载体金属络合定量结构-性质关系研究的线性和非线性方法的基准测试。
J Chem Inf Model. 2006 Mar-Apr;46(2):808-19. doi: 10.1021/ci0504216.
7
Quantitative study of the structure-retention index relationship in the imine family.亚胺家族中结构-保留指数关系的定量研究。
J Chromatogr A. 2006 Jan 13;1102(1-2):238-44. doi: 10.1016/j.chroma.2005.10.019. Epub 2005 Nov 8.
8
Quantitative structure-activity relationship modeling of juvenile hormone mimetic compounds for Culex pipiens larvae, with a discussion of descriptor-thinning methods.致倦库蚊幼虫保幼激素模拟化合物的定量构效关系建模及描述符精简方法探讨
J Chem Inf Model. 2006 Jan-Feb;46(1):65-77. doi: 10.1021/ci050215y.
9
Prediction of HPLC retention index using artificial neural networks and IGroup E-state indices.利用人工神经网络和IGroup E态指数预测高效液相色谱保留指数
J Chem Inf Model. 2009 Apr;49(4):788-99. doi: 10.1021/ci9000162.
10
Artificial neural network prediction of retention factors of some benzene derivatives and heterocyclic compounds in micellar electrokinetic chromatography.人工神经网络预测胶束电动色谱中某些苯衍生物和杂环化合物的保留因子
Electrophoresis. 2005 Sep;26(18):3438-44. doi: 10.1002/elps.200500203.

引用本文的文献

1
Insight into the Structural Determinants of Imidazole Scaffold-Based Derivatives as TNF-α Release Inhibitors by in Silico Explorations.通过计算机模拟探索深入了解基于咪唑骨架的衍生物作为肿瘤坏死因子-α释放抑制剂的结构决定因素。
Int J Mol Sci. 2015 Aug 25;16(9):20118-38. doi: 10.3390/ijms160920118.
2
An efficient algorithm coupled with synthetic minority over-sampling technique to classify imbalanced PubChem BioAssay data.一种有效的算法与合成少数过采样技术相结合,用于对不平衡的 PubChem BioAssay 数据进行分类。
Anal Chim Acta. 2014 Jan 2;806:117-27. doi: 10.1016/j.aca.2013.10.050. Epub 2013 Nov 6.
3
Applying in-silico retention index and mass spectra matching for identification of unknown metabolites in accurate mass GC-TOF mass spectrometry.
应用在线保留指数和质谱匹配鉴定精确质量 GC-TOF 质谱中的未知代谢物。
Anal Chem. 2011 Aug 1;83(15):5895-902. doi: 10.1021/ac2006137. Epub 2011 Jun 28.