Institute of Chemistry, University of Tartu, Tartu, Estonia.
SAR QSAR Environ Res. 2013;24(3):175-99. doi: 10.1080/1062936X.2012.762426. Epub 2013 Feb 14.
Quantitative structure-activity relationships (QSARs) are broadly classified as global or local, depending on their molecular constitution. Global models use large and diverse training sets covering a wide range of chemical space. Local models focus on smaller structurally or chemically similar subsets that are conventionally selected by human experts or alternatively using clustering analysis. The current study focuses on the comparative analysis of different clustering algorithms (expectation-maximization, K-means and hierarchical) for seven different descriptor sets as structural characteristics and two rule-based approaches to select subsets for designing local QSAR models. A total of 111 local QSAR models are developed for predicting bioconcentration factor. Predictions from local models were compared with corresponding predictions from the global model. The comparison of coefficients of determination (r(2)) and standard deviations for local models with similar subsets from the global model show improved prediction quality in 97% of cases. The descriptor content of derived QSARs is discussed and analyzed. Local QSAR models were further consolidated within the framework of consensus approach. All different consensus approaches increased performance over the global and local models. The consensus approach reduced the number of strongly deviating predictions by evening out prediction errors, which were produced by some local QSARs.
定量构效关系(QSAR)根据其分子结构广泛分为全局或局部。全局模型使用包含广泛化学空间的大型和多样化的训练集。局部模型侧重于更小的结构或化学上相似的子集,这些子集通常由人类专家选择,或者使用聚类分析选择。本研究重点比较了不同聚类算法(期望最大化、K-均值和层次聚类)对于七种不同描述符集的结构特征和两种基于规则的方法,用于选择设计局部 QSAR 模型的子集。总共为预测生物浓缩因子开发了 111 个局部 QSAR 模型。将局部模型的预测与全局模型的相应预测进行比较。对于来自全局模型的具有相似子集的局部模型,比较决定系数(r(2))和标准偏差,在 97%的情况下显示出改进的预测质量。讨论和分析了衍生 QSAR 的描述符内容。在共识方法的框架内进一步整合了局部 QSAR 模型。所有不同的共识方法都提高了全局和局部模型的性能。共识方法通过平均预测误差减少了强烈偏离预测的数量,这些预测是由一些局部 QSAR 产生的。