采用预测平方相关系数检验集活性均值与训练集活性均值进行外部验证和预测。

External validation and prediction employing the predictive squared correlation coefficient test set activity mean vs training set activity mean.

作者信息

Schüürmann Gerrit, Ebert Ralf-Uwe, Chen Jingwen, Wang Bin, Kühne Ralph

机构信息

Department of Ecological Chemistry, UFZ Helmholtz Centre for Environmental Research, Permoserstrasse 15, 04318 Leipzig, Germany.

出版信息

J Chem Inf Model. 2008 Nov;48(11):2140-5. doi: 10.1021/ci800253u.

DOI:10.1021/ci800253u

PMID:18954136

Abstract

The external prediction capability of quantitative structure-activity relationship (QSAR) models is often quantified using the predictive squared correlation coefficient, q (2). This index relates the predictive residual sum of squares, PRESS, to the activity sum of squares, SS, without postprocessing of the model output, the latter of which is automatically done when calculating the conventional squared correlation coefficient, r (2). According to the current OECD guidelines, q (2) for external validation should be calculated with SS referring to the training set activity mean. Our present findings including a mathematical proof demonstrate that this approach yields a systematic overestimation of the prediction capability that is triggered by the difference between the training and test set activity means. Example calculations with three regression models and data sets taken from literature show further that for external test sets, q (2) based on the training set activity mean may become even larger than r (2). As a consequence, we suggest to always use the test set activity mean when quantifying the external prediction capability through q (2) and to revise the respective OECD guidance document accordingly. The discussion includes a comparison between r (2) and q (2) value ranges and the q (2) statistics for cross-validation.

摘要

定量构效关系（QSAR）模型的外部预测能力通常使用预测平方相关系数q(2)来量化。该指标将预测残差平方和（PRESS）与活性平方和（SS）相关联，无需对模型输出进行后处理，而在计算传统平方相关系数r(2)时会自动进行后处理。根据当前经合组织的指导方针，外部验证的q(2)应使用参照训练集活性均值的SS来计算。我们目前的研究结果包括一个数学证明，表明这种方法会导致对预测能力的系统性高估，这是由训练集和测试集活性均值之间的差异引发的。对三个回归模型以及取自文献的数据集进行的示例计算进一步表明，对于外部测试集，基于训练集活性均值的q(2)可能会变得甚至大于r(2)。因此，我们建议在通过q(2)量化外部预测能力时始终使用测试集活性均值，并相应地修订经合组织的相关指导文件。讨论内容包括r(2)和q(2)值范围的比较以及交叉验证的q(2)统计量。

相似文献

External validation and prediction employing the predictive squared correlation coefficient test set activity mean vs training set activity mean.采用预测平方相关系数检验集活性均值与训练集活性均值进行外部验证和预测。

J Chem Inf Model. 2008 Nov;48(11):2140-5. doi: 10.1021/ci800253u.

Comments on the definition of the Q2 parameter for QSAR validation.关于用于定量构效关系（QSAR）验证的Q2参数定义的评论。

J Chem Inf Model. 2009 Jul;49(7):1669-78. doi: 10.1021/ci900115y.

Combinatorial QSAR modeling of chemical toxicants tested against Tetrahymena pyriformis.针对梨形四膜虫测试的化学毒物的组合定量构效关系建模。

J Chem Inf Model. 2008 Apr;48(4):766-84. doi: 10.1021/ci700443v. Epub 2008 Mar 1.

Predictive QSAR modeling of HIV reverse transcriptase inhibitor TIBO derivatives.HIV逆转录酶抑制剂替博（TIBO）衍生物的预测性定量构效关系建模

Eur J Med Chem. 2009 Apr;44(4):1509-24. doi: 10.1016/j.ejmech.2008.07.020. Epub 2008 Jul 24.

Determination and prediction of xenoestrogens by recombinant yeast-based assay and QSAR.基于重组酵母检测法和定量构效关系对异雌激素的测定与预测

Chemosphere. 2009 Mar;74(9):1152-7. doi: 10.1016/j.chemosphere.2008.11.081. Epub 2009 Jan 10.

QSPR modeling bioconcentration factor (BCF) by balance of correlations.通过相关性平衡进行生物富集因子（BCF）的定量构效关系建模

Eur J Med Chem. 2009 Jun;44(6):2544-51. doi: 10.1016/j.ejmech.2009.01.023. Epub 2009 Jan 31.

Three-dimensional QSAR analyses of 1,3,4-trisubstituted pyrrolidine-based CCR5 receptor inhibitors.基于1,3,4-三取代吡咯烷的CCR5受体抑制剂的三维定量构效关系分析

Eur J Med Chem. 2008 Dec;43(12):2724-34. doi: 10.1016/j.ejmech.2008.01.040. Epub 2008 Feb 8.

Local and global quantitative structure-activity relationship modeling and prediction for the baseline toxicity.基线毒性的局部和全局定量构效关系建模与预测

J Chem Inf Model. 2007 Jan-Feb;47(1):159-69. doi: 10.1021/ci600299j.

Real external predictivity of QSAR models: how to evaluate it? Comparison of different validation criteria and proposal of using the concordance correlation coefficient.QSAR 模型的真实外部预测能力：如何评估？不同验证标准的比较及使用一致性相关系数的建议。

J Chem Inf Model. 2011 Sep 26;51(9):2320-35. doi: 10.1021/ci200211n. Epub 2011 Aug 12.

Prediction of 31P nuclear magnetic resonance chemical shifts for phosphines.膦的31P核磁共振化学位移预测

Spectrochim Acta A Mol Biomol Spectrosc. 2007 Jul;67(3-4):837-46. doi: 10.1016/j.saa.2006.08.041. Epub 2006 Sep 5.

引用本文的文献

NNKcat: deep neural network to predict catalytic constants (Kcat) by integrating protein sequence and substrate structure with enhanced data imbalance handling.NNKcat：通过整合蛋白质序列和底物结构并增强数据不平衡处理来预测催化常数（Kcat）的深度神经网络。

Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf212.

Prediction of the water solubility by a graph convolutional-based neural network on a highly curated dataset.基于图卷积神经网络在高度精选数据集上对水溶性进行预测。

J Cheminform. 2025 Apr 21;17(1):55. doi: 10.1186/s13321-025-01000-9.

Computational design of potent dimeric phenylthiazole NS5A inhibitors for hepatitis C virus.用于丙型肝炎病毒的强效二聚体苯基噻唑 NS5A 抑制剂的计算设计

Sci Rep. 2024 Dec 30;14(1):31655. doi: 10.1038/s41598-024-80082-1.

Predicting the Time-Dependent Toxicities of Binary Mixtures of Five Antibiotics to sp.- Based on the QSAR Model.基于定量构效关系模型预测五种抗生素二元混合物对sp.的时间依赖性毒性

Environ Health (Wash). 2024 Apr 17;2(7):465-473. doi: 10.1021/envhealth.4c00001. eCollection 2024 Jul 19.

First report on exploration of structural features of natural compounds (NPACT database) for anti-breast cancer activity (MCF-7): QSAR-based virtual screening, molecular docking, ADMET, MD simulation, and DFT studies.关于天然化合物结构特征探索（NPACT数据库）用于抗乳腺癌活性（MCF-7）的首次报告：基于定量构效关系的虚拟筛选、分子对接、药物代谢动力学/药物毒性预测、分子动力学模拟和密度泛函理论研究。

In Silico Pharmacol. 2024 Oct 19;12(2):92. doi: 10.1007/s40203-024-00266-5. eCollection 2024.

Predicting anti-trypanosome effect of carbazole-derived compounds by powerful SVM with novel kernel function and comprehensive learning PSO.利用具有新型核函数和综合学习 PSO 的强大 SVM 预测咔唑衍生化合物的抗锥虫作用。

Antimicrob Agents Chemother. 2024 Jul 9;68(7):e0026524. doi: 10.1128/aac.00265-24. Epub 2024 May 29.

Monitoring Flow-Forming Processes Using Design of Experiments and a Machine Learning Approach Based on Randomized-Supervised Time Series Forest and Recursive Feature Elimination.使用实验设计和基于随机监督时间序列森林与递归特征消除的机器学习方法监测流动成型过程。

Sensors (Basel). 2024 Feb 27;24(5):1527. doi: 10.3390/s24051527.

Integrated predictive QSAR, Read Across, and q-RASAR analysis for diverse agrochemical phytotoxicity in oat and corn: A consensus-based approach for risk assessment and prioritization.燕麦和玉米中多种农用化学品植物毒性的综合预测定量构效关系、类推法和q-RASAR分析：基于共识的风险评估和优先级确定方法

Environ Sci Pollut Res Int. 2024 Feb;31(8):12371-12386. doi: 10.1007/s11356-024-31872-7. Epub 2024 Jan 17.

Computational investigation of unsaturated ketone derivatives as MAO-B inhibitors by using QSAR, ADME/Tox, molecular docking, and molecular dynamics simulations.通过定量构效关系（QSAR）、药物代谢动力学/毒理学（ADME/Tox）、分子对接和分子动力学模拟对不饱和酮衍生物作为单胺氧化酶-B（MAO-B）抑制剂进行计算研究。

Turk J Chem. 2021 Dec 18;46(3):687-703. doi: 10.55730/1300-0527.3360. eCollection 2022.

Structural disconnection is associated with disability in the neuromyelitis optica spectrum disorder.结构连接中断与视神经脊髓炎谱系障碍的残疾有关。

Brain Imaging Behav. 2023 Dec;17(6):664-673. doi: 10.1007/s11682-023-00792-4. Epub 2023 Sep 7.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

采用预测平方相关系数检验集活性均值与训练集活性均值进行外部验证和预测。

External validation and prediction employing the predictive squared correlation coefficient test set activity mean vs training set activity mean.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献