Academy of Scientific and Innovative Research, Anusandhan Bhawan, Rafi Marg, New Delhi 110 001, India; Environmental Chemistry Division, CSIR-Indian Institute of Toxicology Research, Post Box 80, Mahatma Gandhi Marg, Lucknow 226 001, India.
Academy of Scientific and Innovative Research, Anusandhan Bhawan, Rafi Marg, New Delhi 110 001, India; Environmental Chemistry Division, CSIR-Indian Institute of Toxicology Research, Post Box 80, Mahatma Gandhi Marg, Lucknow 226 001, India.
Toxicol Appl Pharmacol. 2014 Mar 15;275(3):198-212. doi: 10.1016/j.taap.2014.01.006. Epub 2014 Jan 23.
Ensemble learning approach based decision treeboost (DTB) and decision tree forest (DTF) models are introduced in order to establish quantitative structure-toxicity relationship (QSTR) for the prediction of toxicity of 1450 diverse chemicals. Eight non-quantum mechanical molecular descriptors were derived. Structural diversity of the chemicals was evaluated using Tanimoto similarity index. Stochastic gradient boosting and bagging algorithms supplemented DTB and DTF models were constructed for classification and function optimization problems using the toxicity end-point in T. pyriformis. Special attention was drawn to prediction ability and robustness of the models, investigated both in external and 10-fold cross validation processes. In complete data, optimal DTB and DTF models rendered accuracies of 98.90%, 98.83% in two-category and 98.14%, 98.14% in four-category toxicity classifications. Both the models further yielded classification accuracies of 100% in external toxicity data of T. pyriformis. The constructed regression models (DTB and DTF) using five descriptors yielded correlation coefficients (R(2)) of 0.945, 0.944 between the measured and predicted toxicities with mean squared errors (MSEs) of 0.059, and 0.064 in complete T. pyriformis data. The T. pyriformis regression models (DTB and DTF) applied to the external toxicity data sets yielded R(2) and MSE values of 0.637, 0.655; 0.534, 0.507 (marine bacteria) and 0.741, 0.691; 0.155, 0.173 (algae). The results suggest for wide applicability of the inter-species models in predicting toxicity of new chemicals for regulatory purposes. These approaches provide useful strategy and robust tools in the screening of ecotoxicological risk or environmental hazard potential of chemicals.
基于集成学习方法的决策树增强(DTB)和决策树森林(DTF)模型被引入,以建立用于预测 1450 种不同化学品毒性的定量结构-毒性关系(QSTR)。提取了 8 个非量子力学分子描述符。使用 Tanimoto 相似性指数评估化学品的结构多样性。使用毒性终点在 T. pyriformis 中构建了随机梯度提升和袋装算法补充的 DTB 和 DTF 模型,用于分类和功能优化问题。特别关注模型的预测能力和稳健性,在外部和 10 倍交叉验证过程中进行了研究。在完整数据中,最优 DTB 和 DTF 模型在两类和四类毒性分类中分别达到了 98.90%和 98.83%的准确率,在四类毒性分类中达到了 98.14%和 98.14%的准确率。这两个模型在 T. pyriformis 的外部毒性数据中进一步产生了 100%的分类准确率。使用五个描述符构建的回归模型(DTB 和 DTF)在完整的 T. pyriformis 数据中产生了 0.945 和 0.944 的相关系数(R(2)),0.059 和 0.064 的均方误差(MSE)。将 T. pyriformis 回归模型(DTB 和 DTF)应用于外部毒性数据集,产生了 0.637 和 0.655 的 R(2)值,0.534 和 0.507 的 R(2)值(海洋细菌)和 0.741 和 0.691 的 R(2)值(藻类)。结果表明,这些物种间模型在预测新化学品的毒性方面具有广泛的适用性,可用于监管目的。这些方法为化学品的生态毒理学风险或环境危害潜力的筛选提供了有用的策略和稳健的工具。