Wu Shuang, Li Shi-Xin, Qiu Jing, Zhao Hai-Ming, Li Yan-Wen, Feng Nai-Xian, Liu Bai-Lin, Cai Quan-Ying, Xiang Lei, Mo Ce-Hui, Li Qing X
Guangdong Provincial Research Center for Environment Pollution Control and Remediation Materials, College of Life Science and Technology, Jinan University, Guangzhou 510632, China.
Department of Molecular Biosciences and Bioengineering, University of Hawaii at Manoa, Honolulu, Hawaii, 96822, United States.
Environ Sci Technol. 2024 Aug 13. doi: 10.1021/acs.est.4c03966.
Acute oral toxicity is currently not available for most polycyclic aromatic hydrocarbons (PAHs), especially their derivatives, because it is cost-prohibitive to experimentally determine all of them. Here, quantitative structure-activity relationship (QSAR) models using machine learning (ML) for predicting the toxicity of PAH derivatives were developed, based on oral toxicity data points of 788 individual substances of rats. Both the individual ML algorithm gradient boosting regression trees (GBRT) and the stacking ML algorithm (extreme gradient boosting + GBRT + random forest regression) provided the best prediction results with satisfactory determination coefficients for both cross-validation and the test set. It was found that those PAH derivatives with fewer polar hydrogens, more large-sized atoms, more branches, and lower polarizability have higher toxicity. Software based on the optimal ML-QSAR model was successfully developed to expand the application potential of the developed model, obtaining reliable prediction of pLD values and reference doses for 6893 external PAH derivatives. Among these chemicals, 472 were identified as moderately or highly toxic; 10 out of them had clear environment detection or use records. The findings provide valuable insights into the toxicity of PAHs and their derivatives, offering a standard platform for effectively evaluating chemical toxicity using ML-QSAR models.
目前,大多数多环芳烃(PAHs),尤其是其衍生物的急性经口毒性数据尚不可得,因为通过实验确定所有这些物质的毒性成本过高。在此,基于788种大鼠个体物质的经口毒性数据点,开发了使用机器学习(ML)预测PAH衍生物毒性的定量构效关系(QSAR)模型。个体ML算法梯度提升回归树(GBRT)和堆叠ML算法(极端梯度提升+GBRT+随机森林回归)在交叉验证和测试集上均提供了最佳预测结果,判定系数令人满意。研究发现,那些极性氢较少、大尺寸原子较多、支链较多且极化率较低的PAH衍生物具有较高的毒性。基于最优ML-QSAR模型的软件成功开发,以扩展所开发模型的应用潜力,获得了6893种外部PAH衍生物的可靠pLD值和参考剂量预测。在这些化学物质中,472种被鉴定为中度或高度毒性;其中10种有明确的环境检测或使用记录。这些发现为PAHs及其衍生物的毒性提供了有价值的见解,为使用ML-QSAR模型有效评估化学物质毒性提供了一个标准平台。