He Linnan, Jurs Peter C
Department of Chemistry, The Pennsylvania State University, 104 Chemistry Building, University Park, PA 16802, USA.
J Mol Graph Model. 2005 Jun;23(6):503-23. doi: 10.1016/j.jmgm.2005.03.003.
Quantitative structure activity relationships (QSAR) are one of the well-developed areas in computational chemistry. In this field, many successful predictive models have been developed for various property, activity or toxicity predictions. However, the predictive power of models for new query compounds is often not well characterized. The breadth of applicability of models is often not characterized. In other words, with a given QSAR model and a specific query compound to be predicted, can the model be used reliably for the desired prediction? In this study, we assessed the reliability of QSAR models' prediction on query compounds. Our approach, employing hierarchical clustering, was developed and tested using a test dataset containing 322 organic compounds with fathead minnow acute aquatic toxicity as the activity of interest. The hypothesis of the approach was that if a query compound is more similar to the compounds used to generate the QSAR model, it should be predicted more accurately. Thus, the core of the approach is to determine the relationship between the similarity of query compounds to the training set compounds of the QSAR model and the prediction accuracy given by that model. This relationship determination was achieved by comparing the results given by the two major components of the approach: objects clustering and activity prediction. With the resultant information from the two steps, a direct relationship was shown.
定量构效关系(QSAR)是计算化学中发展较为成熟的领域之一。在该领域,已经开发出许多成功的预测模型用于各种性质、活性或毒性预测。然而,模型对新的查询化合物的预测能力往往没有得到很好的表征。模型的适用范围也常常没有得到表征。换句话说,对于给定的QSAR模型和要预测的特定查询化合物,该模型能否可靠地用于所需的预测?在本研究中,我们评估了QSAR模型对查询化合物预测的可靠性。我们采用层次聚类的方法,使用一个包含322种有机化合物的测试数据集进行开发和测试,该数据集以黑头呆鱼的急性水生毒性作为感兴趣的活性。该方法的假设是,如果查询化合物与用于生成QSAR模型的化合物更相似,那么它应该被更准确地预测。因此,该方法的核心是确定查询化合物与QSAR模型训练集化合物的相似性与该模型给出的预测准确性之间的关系。这种关系的确定是通过比较该方法的两个主要组成部分给出的结果来实现的:对象聚类和活性预测。利用这两个步骤得到的信息,显示了一种直接关系。