基于偏差/方差折衷的定量构效关系建模：一种和谐简约的方法。

QSAR modeling based on the bias/variance compromise: a harmonious and parsimonious approach.

作者信息

Kalivas John H, Forrester Joel B, Seipel Heather A

机构信息

Department of Chemistry, Idaho State University, Pocatello, ID 83209, USA.

出版信息

J Comput Aided Mol Des. 2004 Jul-Sep;18(7-9):537-47. doi: 10.1007/s10822-004-4063-5.

DOI:10.1007/s10822-004-4063-5

PMID:15729853

Abstract

Modeling quantitative structure-activity relationships (QSAR) is considered with an emphasis on prediction. An abundance of methods are available to develop such models. Using a harmonious approach that balances the bias and variance of predictions, the best calibration models are identified relative to the bias and variance criteria used. Criteria utilized to determine the adequacy of models are the root mean square error of calibration (RMSEC) and validation (RMSEV), respective R2 values, and the norm of the regression vector. QSAR data from the literature are used to demonstrate concepts. For these data sets and criteria used, it is suggested that models obtained by ridge regression (RR) are more harmonious and parsimonious than models obtained by partial least squares (PLS) and principal component regression (PCR) when the data is mean-centered. The most harmonious RR models have the best bias/variance tradeoff, reflected by the smallest RMSEC, RMSEV, and regression vector norms and the largest calibration and validation R2 values. The most parsimonious RR models have the smallest effective rank.

摘要

定量构效关系（QSAR）建模重点在于预测。有大量方法可用于开发此类模型。采用一种平衡预测偏差和方差的和谐方法，相对于所使用的偏差和方差标准，确定最佳校准模型。用于确定模型充分性的标准是校准均方根误差（RMSEC）和验证均方根误差（RMSEV）、各自的R2值以及回归向量的范数。利用文献中的QSAR数据来阐述概念。对于这些数据集和所使用的标准，建议当数据进行均值中心化时，通过岭回归（RR）获得的模型比通过偏最小二乘法（PLS）和主成分回归（PCR）获得的模型更和谐、更简约。最和谐的RR模型具有最佳的偏差/方差权衡，表现为最小的RMSEC、RMSEV和回归向量范数以及最大的校准和验证R2值。最简约的RR模型具有最小的有效秩。

相似文献

QSAR modeling based on the bias/variance compromise: a harmonious and parsimonious approach.

J Comput Aided Mol Des. 2004 Jul-Sep;18(7-9):537-47. doi: 10.1007/s10822-004-4063-5.

Assessment of pareto calibration, stability, and wavelength selection.

Appl Spectrosc. 2003 Mar;57(3):309-16. doi: 10.1366/000370203321558227.

Sum of ranking differences (SRD) to ensemble multivariate calibration model merits for tuning parameter selection and comparing calibration methods.

Anal Chim Acta. 2015 Apr 15;869:21-33. doi: 10.1016/j.aca.2014.12.056. Epub 2015 Feb 7.

A QSAR study of HIV protease inhibitors using theoretical descriptors.

Curr Comput Aided Drug Des. 2010 Dec;6(4):269-82. doi: 10.2174/1573409911006040269.

Wavelength selection for multivariate calibration using tikhonov regularization.

Appl Spectrosc. 2007 Jan;61(1):85-95. doi: 10.1366/000370207779701479.

Modelling methods and cross-validation variants in QSAR: a multi-level analysis.

SAR QSAR Environ Res. 2018 Sep;29(9):661-674. doi: 10.1080/1062936X.2018.1505778. Epub 2018 Aug 30.

Megavariate analysis of environmental QSAR data. Part I--a basic framework founded on principal component analysis (PCA), partial least squares (PLS), and statistical molecular design (SMD).

Mol Divers. 2006 May;10(2):169-86. doi: 10.1007/s11030-006-9024-6. Epub 2006 Jun 13.

Quantitative structure-activity relationship (QSAR) studies of quinolone antibacterials against M. fortuitum and M. smegmatis using theoretical molecular descriptors.

J Mol Model. 2007 Jan;13(1):111-20. doi: 10.1007/s00894-006-0133-z. Epub 2006 Aug 24.

Chemometrics-assisted simultaneous voltammetric determination of ascorbic acid, uric acid, dopamine and nitrite: application of non-bilinear voltammetric data for exploiting first-order advantage.

Talanta. 2014 Feb;119:553-63. doi: 10.1016/j.talanta.2013.11.028. Epub 2013 Nov 27.

Reliable Model Selection without Reference Values by Utilizing Model Diversity with Prediction Similarity.

J Chem Inf Model. 2021 May 24;61(5):2220-2230. doi: 10.1021/acs.jcim.0c01493. Epub 2021 Apr 26.

引用本文的文献

Statistical variation in progressive scrambling.

J Comput Aided Mol Des. 2004 Jul-Sep;18(7-9):563-76. doi: 10.1007/s10822-004-4077-z.

本文引用的文献

Assessment of pareto calibration, stability, and wavelength selection.

Appl Spectrosc. 2003 Mar;57(3):309-16. doi: 10.1366/000370203321558227.

Prediction of dihydrofolate reductase inhibition and selectivity using computational neural networks and linear discriminant analysis.

J Mol Graph Model. 2003 Mar;21(5):391-419. doi: 10.1016/s1093-3263(02)00187-0.

Prediction of protein retention times in anion-exchange chromatography systems using support vector regression.

J Chem Inf Comput Sci. 2002 Nov-Dec;42(6):1347-57. doi: 10.1021/ci025580t.

Development of quantitative structure-activity relationship and classification models for a set of carbonic anhydrase inhibitors.

J Chem Inf Comput Sci. 2002 Jan-Feb;42(1):94-102. doi: 10.1021/ci0100696.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于偏差/方差折衷的定量构效关系建模：一种和谐简约的方法。

QSAR modeling based on the bias/variance compromise: a harmonious and parsimonious approach.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献