Grohmann Rainer, Schindler Torsten
Institute for Theoretical Chemistry, University of Vienna, Austria.
J Comput Chem. 2008 Apr 30;29(6):847-60. doi: 10.1002/jcc.20831.
Widely used regression approaches in modeling quantitative structure-property relationships, such as PLS regression, are highly susceptible to outlying observations that will impair the prognostic value of a model. Our aim is to compile homogeneous datasets as the basis for regression modeling by removing outlying compounds and applying variable selection. We investigate different approaches to create robust, outlier-resistant regression models in the field of prediction of drug molecules' permeability. The objective is to join the strength of outlier detection and variable elimination increasing the predictive power of prognostic regression models. In conclusion, outlier detection is employed to identify multiple, homogeneous data subsets for regression modeling.
在建模定量构效关系时广泛使用的回归方法,如偏最小二乘回归(PLS回归),极易受到异常观测值的影响,这些异常值会损害模型的预测价值。我们的目标是通过去除异常化合物并应用变量选择来编制同类数据集,作为回归建模的基础。我们研究了在药物分子渗透性预测领域创建稳健、抗异常值回归模型的不同方法。目的是结合异常值检测和变量消除的优势,提高预后回归模型的预测能力。总之,采用异常值检测来识别多个同类数据子集用于回归建模。