Kalivas John H, Forrester Joel B, Seipel Heather A
Department of Chemistry, Idaho State University, Pocatello, ID 83209, USA.
J Comput Aided Mol Des. 2004 Jul-Sep;18(7-9):537-47. doi: 10.1007/s10822-004-4063-5.
Modeling quantitative structure-activity relationships (QSAR) is considered with an emphasis on prediction. An abundance of methods are available to develop such models. Using a harmonious approach that balances the bias and variance of predictions, the best calibration models are identified relative to the bias and variance criteria used. Criteria utilized to determine the adequacy of models are the root mean square error of calibration (RMSEC) and validation (RMSEV), respective R2 values, and the norm of the regression vector. QSAR data from the literature are used to demonstrate concepts. For these data sets and criteria used, it is suggested that models obtained by ridge regression (RR) are more harmonious and parsimonious than models obtained by partial least squares (PLS) and principal component regression (PCR) when the data is mean-centered. The most harmonious RR models have the best bias/variance tradeoff, reflected by the smallest RMSEC, RMSEV, and regression vector norms and the largest calibration and validation R2 values. The most parsimonious RR models have the smallest effective rank.
定量构效关系(QSAR)建模重点在于预测。有大量方法可用于开发此类模型。采用一种平衡预测偏差和方差的和谐方法,相对于所使用的偏差和方差标准,确定最佳校准模型。用于确定模型充分性的标准是校准均方根误差(RMSEC)和验证均方根误差(RMSEV)、各自的R2值以及回归向量的范数。利用文献中的QSAR数据来阐述概念。对于这些数据集和所使用的标准,建议当数据进行均值中心化时,通过岭回归(RR)获得的模型比通过偏最小二乘法(PLS)和主成分回归(PCR)获得的模型更和谐、更简约。最和谐的RR模型具有最佳的偏差/方差权衡,表现为最小的RMSEC、RMSEV和回归向量范数以及最大的校准和验证R2值。最简约的RR模型具有最小的有效秩。