Institute of Environmental Research at Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China; School of Environmental Science and Engineering, Shandong University, Qingdao 266237, China.
Institute of Environmental Research at Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China.
J Hazard Mater. 2022 Jun 5;431:128558. doi: 10.1016/j.jhazmat.2022.128558. Epub 2022 Feb 23.
Quantitative structure-activity relationship (QSAR) modeling has been widely used to predict the potential harm of chemicals, in which the prediction heavily relies on the accurate annotation of chemical structures. However, it is difficult to determine the accurate structure of an unknown compound in many cases, such as in complex water environments. Here, we solved the above problem by linking electron ionization mass spectra (EI-MS) of organic chemicals to toxicity endpoints through various machine learning methods. The proposed method was verified by predicting 50% growth inhibition of Tetrahymena pyriformis (T. pyriformis) and liver toxicity. The optimal model performance obtained an R > 0.7 or balanced accuracy > 0.72 for both the training set and test set. External experimentation further verified the application potential of our proposed method in the toxicity prediction of unknown chemicals. Feature importance analysis allowed us to identify critical spectral features that were responsible for chemical-induced toxicity. Our approach has the potential for toxicity prediction in such fields that it is difficult to determine accurate chemical structures.
定量构效关系(QSAR)建模已被广泛用于预测化学品的潜在危害,其预测严重依赖于化学结构的准确注释。然而,在许多情况下,如在复杂的水环境中,很难确定未知化合物的准确结构。在这里,我们通过将各种机器学习方法将有机化学品的电子电离质谱(EI-MS)与毒性终点联系起来,解决了上述问题。通过预测四膜虫(T. pyriformis)的 50%生长抑制和肝毒性,验证了所提出的方法。对于训练集和测试集,最优模型性能的 R > 0.7 或平衡准确率 > 0.72。外部实验进一步验证了我们提出的方法在未知化学品毒性预测中的应用潜力。特征重要性分析使我们能够识别负责化学诱导毒性的关键光谱特征。我们的方法有可能应用于难以确定准确化学结构的毒性预测领域。