Center for Heterocyclic Compounds, Department of Chemistry, University of Florida, Gainesville, Florida, USA.
J Toxicol Environ Health A. 2009;72(19):1181-90. doi: 10.1080/15287390903091863.
The experimental EC(50) toxicities toward Daphnia magna for a series of 130 benzoic acids, benzaldehydes, phenylsulfonyl acetates, cycloalkane-carboxylates, benzanilides, and other esters were studied using the Best multilinear regression algorithm (BMLR) implemented in CODESSA. A modified quantitative structure-activity relationships (QSAR) procedure was applied guaranteeing the stability and reproducibility of the results. Separating the initial data set into training and test subsets generated three independent models with an average R(2) of .735. A five-descriptor general model including all 130 compounds, constructed using the descriptors found effective for the independent subsets, was characterized by the following statistical parameters: R(2) = .712; R(2)(cv) = .676; F = 61.331; s(2) = 0.6. The removal of two extreme outliers improved significantly the statistical parameters: R(2) = .759; R(2)(cv) = .728; F = 77.032; s(2) = 0.499. The sensitivity of the general model to chance correlations was estimated by applying a scrambling procedure involving 20 randomizations of the original property values. The resulting R(2) = .192 demonstrated the high robustness of the model proposed. The descriptors appearing in the obtained models are related to the biochemical nature of the adverse effects. An additional study of the EC(50)/LC(50) relationship for a series of 28 compounds (part of our general data set) revealed that these endpoints correlated with R(2) = .98.
采用 CODESSA 中的最佳多元线性回归算法(BMLR),研究了一系列 130 种苯甲酸、苯甲醛、苯磺酰基乙酸酯、环烷羧酸酯、苯甲酰基苯胺和其他酯类对大型溞的实验 EC(50)毒性。应用了一种改进的定量构效关系(QSAR)程序,以保证结果的稳定性和可重复性。将初始数据集分为训练集和测试集,生成了三个独立的模型,平均 R(2)为 0.735。使用独立子集发现的有效描述符构建了一个包含所有 130 种化合物的五描述符通用模型,其统计参数如下:R(2)= 0.712;R(2)(cv)= 0.676;F = 61.331;s(2)= 0.6。去除两个极端异常值后,显著改善了统计参数:R(2)= 0.759;R(2)(cv)= 0.728;F = 77.032;s(2)= 0.499。通过涉及原始属性值 20 次随机化的 scrambling 程序,估计了通用模型对偶然相关性的敏感性。得到的 R(2)= 0.192 表明了所提出模型的高度稳健性。所得到的模型中出现的描述符与不良影响的生化性质有关。对我们通用数据集中的 28 种化合物(部分)的 EC(50)/LC(50)关系的进一步研究表明,这些终点与 R(2)= 0.98 相关。