Zhang Ying, Xie Liangxu, Zhang Dawei, Xu Xiaojun, Xu Lei
Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China.
Molecules. 2023 Nov 7;28(22):7457. doi: 10.3390/molecules28227457.
Persistent organic pollutants (POPs) are ubiquitous and bioaccumulative, posing potential and long-term threats to human health and the ecological environment. Quantitative structure-activity relationship (QSAR) studies play a guiding role in analyzing the toxicity and environmental fate of different organic pollutants. In the current work, five molecular descriptors are utilized to construct QSAR models for predicting the mean and maximum air half-lives of POPs, including specifically the energy of the highest occupied molecular orbital (HOMO_Energy_DMol3), a component of the dipole moment along the z-axis (Dipole_Z), fragment contribution to SAscore (SAscore_Fragments), subgraph counts (SC_3_P), and structural information content (SIC). The QSAR models were achieved through the application of three machine learning methods: partial least squares (PLS), multiple linear regression (MLR), and genetic function approximation (GFA). The determination coefficients () and relative errors () for the mean air half-life of each model are 0.916 and 3.489% (PLS), 0.939 and 5.048% (MLR), 0.938 and 5.131% (GFA), respectively. Similarly, the determination coefficients () and for the maximum air half-life of each model are 0.915 and 5.629% (PLS), 0.940 and 10.090% (MLR), 0.939 and 11.172% (GFA), respectively. Furthermore, the mechanisms that elucidate the significant factors impacting the air half-lives of POPs have been explored. The three regression models show good predictive and extrapolation abilities for POPs within the application domain.
持久性有机污染物(POPs)广泛存在且具有生物累积性,对人类健康和生态环境构成潜在的长期威胁。定量构效关系(QSAR)研究在分析不同有机污染物的毒性和环境归宿方面发挥着指导作用。在当前工作中,利用五个分子描述符构建QSAR模型,以预测POPs的平均和最大空气半衰期,具体包括最高占据分子轨道能量(HOMO_Energy_DMol3)、沿z轴的偶极矩分量(Dipole_Z)、片段对SAscore的贡献(SAscore_Fragments)、子图计数(SC_3_P)和结构信息含量(SIC)。通过应用三种机器学习方法实现了QSAR模型:偏最小二乘法(PLS)、多元线性回归(MLR)和遗传函数逼近(GFA)。每个模型的平均空气半衰期的决定系数()和相对误差()分别为0.916和3.489%(PLS)、0.939和5.048%(MLR)、0.938和5.131%(GFA)。同样,每个模型的最大空气半衰期的决定系数()和分别为0.915和5.629%(PLS)、0.940和10.090%(MLR)、0.939和11.172%(GFA)。此外,还探索了阐明影响POPs空气半衰期的重要因素的机制。这三个回归模型对应用领域内的POPs显示出良好的预测和外推能力。