Kar Supratik, Roy Kunal, Leszczynski Jerzy
Interdisciplinary Center for Nanotoxicity, Department of Chemistry and Biochemistry, Jackson State University, Jackson, MS, USA.
Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, India.
Methods Mol Biol. 2018;1800:141-169. doi: 10.1007/978-1-4939-7899-1_6.
In the context of human safety assessment through quantitative structure-activity relationship (QSAR) modeling, the concept of applicability domain (AD) has an enormous role to play. The Organization of Economic Co-operation and Development (OECD) for QSAR model validation recommended as principle 3 "A defined domain of applicability" to be present for a predictive QSAR model. The study of AD allows estimating the uncertainty in the prediction for a particular molecule based on how similar it is to the training compounds which are used in the model development. In the current scenario, AD represents an active research topic, and many methods have been designed to estimate the competence of a model and the confidence in its outcome for a given prediction task. Thus, characterization of interpolation space is significant in defining the AD. The diverse set of reported AD methods was constructed through different hypotheses and algorithms. These multiplicities of methodologies mystify the end users and make the comparison of the AD for different models a complex issue to address. We have attempted to summarize in this chapter the important concepts of AD including particulars of the available methods to compute the AD along with their thresholds and criteria for estimating AD through training set interpolation in the descriptor space. The idea about transparent domain and decision domain are also discussed. To help readers determine the AD in their projects, practical examples together with available open source software tools are provided.
在通过定量构效关系(QSAR)建模进行人体安全性评估的背景下,适用域(AD)的概念起着至关重要的作用。经济合作与发展组织(OECD)在QSAR模型验证中建议将“定义的适用域”作为预测性QSAR模型的原则3。对适用域的研究能够基于特定分子与模型开发中使用的训练化合物的相似程度,估计该分子预测的不确定性。在当前情况下,适用域是一个活跃的研究课题,已经设计了许多方法来评估模型的能力及其对给定预测任务结果的置信度。因此,插值空间的表征在定义适用域方面具有重要意义。通过不同的假设和算法构建了各种已报道的适用域方法。这些方法的多样性使最终用户感到困惑,并使得比较不同模型的适用域成为一个难以解决的复杂问题。在本章中,我们试图总结适用域的重要概念,包括计算适用域的可用方法的细节,以及通过描述符空间中的训练集插值来估计适用域的阈值和标准。还讨论了透明域和决策域的概念。为了帮助读者在其项目中确定适用域,提供了实际示例以及可用的开源软件工具。