Sustainable Technology Division, National Risk Management Research Laboratory, Office of Research and Development, U.S. Environmental Protection Agency, 26 W. Martin Luther King Dr., Cincinnati, OH, 45268, USA.
Toxicol Mech Methods. 2008;18(2-3):251-66. doi: 10.1080/15376510701857353.
ABSTRACT A quantitative structure-activity relationship (QSAR) methodology based on hierarchical clustering was developed to predict toxicological endpoints. This methodology utilizes Ward's method to divide a training set into a series of structurally similar clusters. The structural similarity is defined in terms of 2-D physicochemical descriptors (such as connectivity and E-state indices). A genetic algorithm-based technique is used to generate statistically valid QSAR models for each cluster (using the pool of descriptors described above). The toxicity for a given query compound is estimated using the weighted average of the predictions from the closest cluster from each step in the hierarchical clustering assuming that the compound is within the domain of applicability of the cluster. The hierarchical clustering methodology was tested using a Tetrahymena pyriformis acute toxicity data set containing 644 chemicals in the training set and with two prediction sets containing 339 and 110 chemicals. The results from the hierarchical clustering methodology were compared to the results from several different QSAR methodologies.
摘要 本文提出了一种基于层次聚类的定量构效关系(QSAR)方法,用于预测毒理学终点。该方法利用 Ward 方法将训练集划分为一系列结构相似的簇。结构相似性是根据 2-D 物理化学描述符(如连接性和 E 状态指数)来定义的。基于遗传算法的技术用于为每个簇生成统计有效的 QSAR 模型(使用上述描述符池)。对于给定的查询化合物,毒性使用来自层次聚类中每个步骤的最近簇的预测的加权平均值来估计,假设化合物在簇的适用性域内。层次聚类方法使用含有 644 种化合物的四膜虫急性毒性数据集进行了测试,并使用含有 339 种和 110 种化合物的两个预测集进行了测试。将层次聚类方法的结果与几种不同的 QSAR 方法的结果进行了比较。