Dimitrov Sabcho, Dimitrova Gergana, Pavlov Todor, Dimitrova Nadezhda, Patlewicz Grace, Niemela Jay, Mekenyan Ovanes
Laboratory of Mathematical Chemistry, University "Prof. As. Zlatarov", 8010 Bourgas, Bulgaria.
J Chem Inf Model. 2005 Jul-Aug;45(4):839-49. doi: 10.1021/ci0500381.
A stepwise approach for determining the model applicability domain is proposed. Four stages are applied to account for the diversity and complexity of the current SAR/QSAR models, reflecting their mechanistic rationality (including metabolic activation of chemicals) and transparency. General parametric requirements are imposed in the first stage, specifying in the domain only those chemicals that fall in the range of variation of the physicochemical properties of the chemicals in the training set. The second stage defines the structural similarity between chemicals that are correctly predicted by the model. The structural neighborhood of atom-centered fragments is used to determine this similarity. The third stage in defining the domain is based on a mechanistic understanding of the modeled phenomenon. Here, the model domain combines the reliability of specific reactive groups hypothesized to cause the effect and the domain of explanatory variables determining the parametric requirements in order for functional groups to elicit their reactivity. Finally, the reliability of simulated metabolism (metabolites, pathways, and maps) is taken into account in assessing the reliability of predictions, if metabolic activation of chemicals is a part of the (Q)SAR model. Some of the stages of the proposed approach for defining the model domain can be eliminated depending on the availability and quality of the experimental data used to derive the model, the specificity of (Q)SARs, and the goals of their ultimate application. The performance of the proposed definition of the model domain is tested using several examples of (Q)SARs that have been externally validated, including models for predicting acute toxicity, skin sensitization, and biodegradation. The results clearly showed that credibility in predictions of QSAR models for chemicals belonging to their domain is much higher than for chemicals outside this domain.
提出了一种确定模型适用域的逐步方法。应用四个阶段来考虑当前SAR/QSAR模型的多样性和复杂性,反映其机理合理性(包括化学物质的代谢活化)和透明度。在第一阶段施加一般参数要求,仅在该域中指定那些落在训练集中化学物质物理化学性质变化范围内的化学物质。第二阶段定义模型正确预测的化学物质之间的结构相似性。基于原子中心片段的结构邻域来确定这种相似性。定义域的第三阶段基于对建模现象的机理理解。在此,模型域结合了假设导致该效应的特定反应基团的可靠性以及确定参数要求以使官能团引发其反应性的解释变量的域。最后,如果化学物质的代谢活化是(Q)SAR模型的一部分,则在评估预测可靠性时考虑模拟代谢(代谢物、途径和图谱)的可靠性。根据用于推导模型的实验数据的可用性和质量、(Q)SAR的特异性及其最终应用目标,可以消除所提出的定义模型域方法的某些阶段。使用几个已进行外部验证的(Q)SAR示例来测试所提出的模型域定义的性能,包括预测急性毒性、皮肤致敏性和生物降解的模型。结果清楚地表明,对于属于其域的化学物质,QSAR模型预测的可信度远高于该域之外的化学物质。