• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

QSAR 中的建模方法和交叉验证变体:多层次分析。

Modelling methods and cross-validation variants in QSAR: a multi-level analysis.

机构信息

a Plasma Chemistry Research Group , Research Centre for Natural Sciences, Hungarian Academy of Sciences , Budapest, Hungary.

b Medicinal Chemistry Research Group , Research Centre for Natural Sciences, Hungarian Academy of Sciences , Budapest, Hungary.

出版信息

SAR QSAR Environ Res. 2018 Sep;29(9):661-674. doi: 10.1080/1062936X.2018.1505778. Epub 2018 Aug 30.

DOI:10.1080/1062936X.2018.1505778
PMID:30160175
Abstract

Prediction performance often depends on the cross- and test validation protocols applied. Several combinations of different cross-validation variants and model-building techniques were used to reveal their complexity. Two case studies (acute toxicity data) were examined, applying five-fold cross-validation (with random, contiguous and Venetian blind forms) and leave-one-out cross-validation (CV). External test sets showed the effects and differences between the validation protocols. The models were generated with multiple linear regression (MLR), principal component regression (PCR), partial least squares (PLS) regression, artificial neural networks (ANN) and support vector machines (SVM). The comparisons were made by the sum of ranking differences (SRD) and factorial analysis of variance (ANOVA). The largest bias and variance could be assigned to the MLR method and contiguous block cross-validation. SRD can provide a unique and unambiguous ranking of methods and CV variants. Venetian blind cross-validation is a promising tool. The generated models were also compared based on their basic performance parameters (r and Q). MLR produced the largest gap, while PCR gave the smallest. Although PCR is the best validated and balanced technique, SVM always outperformed the other methods, when experimental values were the benchmark. Variable selection was advantageous, and the modelling had a larger influence than CV variants.

摘要

预测性能往往取决于所应用的交叉验证和测试验证协议。为了揭示其复杂性,使用了几种不同的交叉验证变体和建模技术的组合。检查了两个案例研究(急性毒性数据),应用了五重交叉验证(随机、连续和威尼斯盲人形式)和留一法交叉验证(CV)。外部测试集显示了验证协议之间的效果和差异。使用多元线性回归(MLR)、主成分回归(PCR)、偏最小二乘回归(PLS)、人工神经网络(ANN)和支持向量机(SVM)生成模型。通过排序差异总和(SRD)和方差因子分析(ANOVA)进行比较。最大的偏差和方差可以归因于 MLR 方法和连续块交叉验证。SRD 可以提供方法和 CV 变体的独特且明确的排序。威尼斯盲人交叉验证是一种很有前途的工具。还根据其基本性能参数(r 和 Q)对生成的模型进行了比较。MLR 产生的差距最大,而 PCR 产生的差距最小。尽管 PCR 是验证和平衡效果最好的技术,但当实验值作为基准时,SVM 始终优于其他方法。变量选择是有利的,建模比 CV 变体的影响更大。

相似文献

1
Modelling methods and cross-validation variants in QSAR: a multi-level analysis.QSAR 中的建模方法和交叉验证变体:多层次分析。
SAR QSAR Environ Res. 2018 Sep;29(9):661-674. doi: 10.1080/1062936X.2018.1505778. Epub 2018 Aug 30.
2
Chemometrics-assisted simultaneous voltammetric determination of ascorbic acid, uric acid, dopamine and nitrite: application of non-bilinear voltammetric data for exploiting first-order advantage.化学计量学辅助同时伏安法测定抗坏血酸、尿酸、多巴胺和亚硝酸盐:利用非双线性伏安数据发挥一阶优势的应用
Talanta. 2014 Feb;119:553-63. doi: 10.1016/j.talanta.2013.11.028. Epub 2013 Nov 27.
3
Development of linear and nonlinear predictive QSAR models and their external validation using molecular similarity principle for anti-HIV indolyl aryl sulfones.基于分子相似性原理的抗HIV吲哚基芳基砜类线性和非线性预测QSAR模型的构建及其外部验证
J Enzyme Inhib Med Chem. 2008 Dec;23(6):980-95. doi: 10.1080/14756360701811379.
4
Quantitative structure-property relationship modelling of the degradability rate constant of alkenes by OH radicals in atmosphere.大气中烯烃被羟基自由基降解速率常数的定量结构-性质关系建模
SAR QSAR Environ Res. 2009;20(1-2):77-90. doi: 10.1080/10629360902726700.
5
A self-adaptive genetic algorithm-artificial neural network algorithm with leave-one-out cross validation for descriptor selection in QSAR study.一种带有留一交叉验证的自适应遗传算法-人工神经网络算法,用于 QSAR 研究中的描述符选择。
J Comput Chem. 2010 Jul 30;31(10):1956-68. doi: 10.1002/jcc.21471.
6
Exploring QSARs of vascular endothelial growth factor receptor-2 (VEGFR-2) tyrosine kinase inhibitors by MLR, PLS and PC-ANN.采用多元线性回归(MLR)、偏最小二乘(PLS)和概率神经网络(PC-ANN)方法研究血管内皮生长因子受体-2(VEGFR-2)酪氨酸激酶抑制剂的定量构效关系(QSARs)。
Curr Pharm Des. 2013;19(12):2237-44. doi: 10.2174/1381612811319120010.
7
Comparison of Multiple Linear Regressions and Neural Networks based QSAR models for the design of new antitubercular compounds.基于多元线性回归和神经网络的定量构效关系模型在新型抗结核化合物设计中的比较。
Eur J Med Chem. 2013;70:831-45. doi: 10.1016/j.ejmech.2013.10.029. Epub 2013 Oct 23.
8
Sum of ranking differences (SRD) to ensemble multivariate calibration model merits for tuning parameter selection and comparing calibration methods.用于集成多元校准模型的排名差异总和(SRD)在调整参数选择和比较校准方法方面具有优势。
Anal Chim Acta. 2015 Apr 15;869:21-33. doi: 10.1016/j.aca.2014.12.056. Epub 2015 Feb 7.
9
MIA-QSAR coupled to different regression methods for the modeling of antimalarial activities of 2-aziridinyl and 2,3-bis-(aziridinyl)-1,4-naphtoquinonyl sulfate and acylate derivatives.MIA-QSAR 与不同回归方法相结合,用于建模 2-氮丙啶基和 2,3-双-(氮丙啶基)-1,4-萘醌基硫酸盐和酰化衍生物的抗疟活性。
Med Chem. 2011 Nov;7(6):645-54. doi: 10.2174/157340611797928343.
10
QSAR modelling of water quality indices of alkylphenol pollutants.烷基酚污染物水质指标的定量构效关系建模
SAR QSAR Environ Res. 2007 Oct-Dec;18(7-8):729-43. doi: 10.1080/10629360701698761.

引用本文的文献

1
Soft Independent Modeling of Class Analogies for the Screening of New Psychoactive Substances through UPLC-HRMS/MS.通过超高效液相色谱-高分辨质谱/质谱法筛选新型精神活性物质的类类比软独立建模
Anal Chem. 2025 Jul 22;97(28):15420-15429. doi: 10.1021/acs.analchem.5c02450. Epub 2025 Jul 11.
2
Discrepant Spatiotemporal Characteristics of Gait Impairments in Thalamic Infarction Patients.丘脑梗死患者步态障碍的时空特征差异
Brain Behav. 2025 May;15(5):e70582. doi: 10.1002/brb3.70582.
3
Short-Wave Infrared Hyperspectral Image-Based Quality Grading of Dried Laver ( spp.).
基于短波红外高光谱图像的紫菜干品质量分级
Foods. 2025 Feb 4;14(3):497. doi: 10.3390/foods14030497.
4
Detecting Grapevine Virus Infections in Red and White Winegrape Canopies Using Proximal Hyperspectral Sensing.利用近红外高光谱遥感技术检测红、白葡萄冠层中的葡萄病毒感染。
Sensors (Basel). 2023 Mar 6;23(5):2851. doi: 10.3390/s23052851.
5
Identification of Coronary Artery Diseases Using Photoplethysmography Signals and Practical Feature Selection Process.利用光电容积脉搏波信号识别冠状动脉疾病及实用特征选择过程
Bioengineering (Basel). 2023 Feb 13;10(2):249. doi: 10.3390/bioengineering10020249.
6
Machine Learning of Raman Spectroscopy Data for Classifying Cancers: A Review of the Recent Literature.用于癌症分类的拉曼光谱数据的机器学习:近期文献综述
Diagnostics (Basel). 2022 Jun 17;12(6):1491. doi: 10.3390/diagnostics12061491.
7
Pretreatment DCE-MRI-Based Deep Learning Outperforms Radiomics Analysis in Predicting Pathologic Complete Response to Neoadjuvant Chemotherapy in Breast Cancer.基于治疗前动态对比增强磁共振成像的深度学习在预测乳腺癌新辅助化疗的病理完全缓解方面优于放射组学分析。
Front Oncol. 2022 Mar 10;12:846775. doi: 10.3389/fonc.2022.846775. eCollection 2022.
8
Machine learning models for classification tasks related to drug safety.用于药物安全相关分类任务的机器学习模型。
Mol Divers. 2021 Aug;25(3):1409-1424. doi: 10.1007/s11030-021-10239-x. Epub 2021 Jun 10.
9
Multi-Level Comparison of Machine Learning Classifiers and Their Performance Metrics.多水平机器学习分类器比较及其性能指标。
Molecules. 2019 Aug 1;24(15):2811. doi: 10.3390/molecules24152811.
10
Intercorrelation Limits in Molecular Descriptor Preselection for QSAR/QSPR.分子描述符预选的 QSAR/QSPR 中的互相关限制。
Mol Inform. 2019 Aug;38(8-9):e1800154. doi: 10.1002/minf.201800154. Epub 2019 Apr 4.