Suppr超能文献

基于偏差/方差折衷的定量构效关系建模:一种和谐简约的方法。

QSAR modeling based on the bias/variance compromise: a harmonious and parsimonious approach.

作者信息

Kalivas John H, Forrester Joel B, Seipel Heather A

机构信息

Department of Chemistry, Idaho State University, Pocatello, ID 83209, USA.

出版信息

J Comput Aided Mol Des. 2004 Jul-Sep;18(7-9):537-47. doi: 10.1007/s10822-004-4063-5.

Abstract

Modeling quantitative structure-activity relationships (QSAR) is considered with an emphasis on prediction. An abundance of methods are available to develop such models. Using a harmonious approach that balances the bias and variance of predictions, the best calibration models are identified relative to the bias and variance criteria used. Criteria utilized to determine the adequacy of models are the root mean square error of calibration (RMSEC) and validation (RMSEV), respective R2 values, and the norm of the regression vector. QSAR data from the literature are used to demonstrate concepts. For these data sets and criteria used, it is suggested that models obtained by ridge regression (RR) are more harmonious and parsimonious than models obtained by partial least squares (PLS) and principal component regression (PCR) when the data is mean-centered. The most harmonious RR models have the best bias/variance tradeoff, reflected by the smallest RMSEC, RMSEV, and regression vector norms and the largest calibration and validation R2 values. The most parsimonious RR models have the smallest effective rank.

摘要

定量构效关系(QSAR)建模重点在于预测。有大量方法可用于开发此类模型。采用一种平衡预测偏差和方差的和谐方法,相对于所使用的偏差和方差标准,确定最佳校准模型。用于确定模型充分性的标准是校准均方根误差(RMSEC)和验证均方根误差(RMSEV)、各自的R2值以及回归向量的范数。利用文献中的QSAR数据来阐述概念。对于这些数据集和所使用的标准,建议当数据进行均值中心化时,通过岭回归(RR)获得的模型比通过偏最小二乘法(PLS)和主成分回归(PCR)获得的模型更和谐、更简约。最和谐的RR模型具有最佳的偏差/方差权衡,表现为最小的RMSEC、RMSEV和回归向量范数以及最大的校准和验证R2值。最简约的RR模型具有最小的有效秩。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验