Toropov Andrey A, Toropova Alla P, Marzo Marco, Dorne Jean Lou, Georgiadis Nikolaos, Benfenati Emilio
Department of Environmental Health Science, Laboratory of Environmental Chemistry and Toxicology, IRCCS-Istituto di Ricerche Farmacologiche Mario Negri, Via La Masa 19, 20156 Milano, Italy.
Department of Environmental Health Science, Laboratory of Environmental Chemistry and Toxicology, IRCCS-Istituto di Ricerche Farmacologiche Mario Negri, Via La Masa 19, 20156 Milano, Italy.
Environ Toxicol Pharmacol. 2017 Jul;53:158-163. doi: 10.1016/j.etap.2017.05.011. Epub 2017 May 23.
Optimal (flexible) descriptors were used to establish quantitative structure - activity relationships (QSAR) for toxicity of pesticides (n=116) towards rainbow trout. A heterogeneous set of hundreds of pesticides has been used, taken from the EFSA's chemical Hazards Database: OpenFoodTox. Optimal descriptors are preparing from simplified molecular input-line entry system (SMILES). So-called, correlation weights of different fragments of SMILES are calculating by the Monte Carlo optimization procedure where correlation coefficient between endpoint and optimal descriptor plays role of the target function. Having maximum of the correlation coefficient for the training set, one can suggest that the optimal descriptor calculated with these correlation weights can correlate with endpoint for external validation set. This approach was checked up with three different distributions into the training (≈85%) set and external validation (≈15%) set. The statistical characteristics of these models are (i) for training set correlation coefficient (r) ranges 0.72-0.81, and root mean squared error (RMSE) ranges 0.54-1.25; (ii) for external (validation) set r ranges 0.74-0.84; and RMSE ranges 0.64-0.75. Computational experiments have shown that presence of chlorine, fluorine, sulfur, and aromatic fragments is promoter of increase for the toxicity.
使用最优(灵活)描述符建立了农药(n = 116)对虹鳟鱼毒性的定量构效关系(QSAR)。研究使用了从欧洲食品安全局(EFSA)的化学危害数据库OpenFoodTox中选取的数百种不同的农药。最优描述符由简化分子线性输入系统(SMILES)生成。通过蒙特卡罗优化程序计算SMILES不同片段的所谓相关权重,其中终点与最优描述符之间的相关系数作为目标函数。对于训练集,若相关系数达到最大值,则可以认为用这些相关权重计算出的最优描述符能够与外部验证集的终点相关。该方法通过将数据集分为训练集(约85%)和外部验证集(约15%)的三种不同划分方式进行了检验。这些模型的统计特征为:(i)训练集的相关系数(r)范围为0.72 - 0.81,均方根误差(RMSE)范围为0.54 - 1.25;(ii)外部(验证)集的r范围为0.74 - 0.84,RMSE范围为0.64 - 0.75。计算实验表明,氯、氟、硫和芳香族片段的存在是毒性增加的促进因素。