a Instituto de Investigaciones Fisicoquímicas Teóricas y Aplicadas (INIFTA), CONICET, UNLP , La Plata , Argentina.
b Departamento de Química, Facultad de Ciencias Exactas y Naturales , Universidad de Belgrano , Buenos Aires , Argentina.
SAR QSAR Environ Res. 2017 Sep;28(9):749-763. doi: 10.1080/1062936X.2017.1377765. Epub 2017 Oct 2.
The ANTARES dataset is a large collection of known and verified experimental bioconcentration factor data, involving 851 highly heterogeneous compounds from which 159 are pesticides. The BCF ANTARES data were used to derive a conformation-independent QSPR model. A large set of 27,017 molecular descriptors was explored, with the main intention of capturing the most relevant structural characteristics affecting the studied property. The structural descriptors were derived with different freeware tools, such as PaDEL, Epi Suite, CORAL, Mold, RECON, and QuBiLs-MAS, and so it was interesting to find out the way that the different descriptor tools complemented each other in order to improve the statistical quality of the established QSPR. The best multivariable linear regression models were found with the Replacement Method variable sub-set selection technique. The proposed QSPR model improves previous reported models of the bioconcentration factor in the present dataset.
ANTARES 数据集是一个大型的已知和经过验证的实验生物浓缩系数数据集合,涉及 851 种高度异构的化合物,其中 159 种是农药。BCF ANTARES 数据被用于推导出一个不依赖构象的 QSPR 模型。探索了一组包含 27017 个分子描述符的大型数据集,主要目的是捕捉影响所研究性质的最相关结构特征。结构描述符是使用不同的免费工具(如 PaDEL、Epi Suite、CORAL、Mold、RECON 和 QuBiLs-MAS)推导出来的,因此很有趣的是,找出不同描述符工具相互补充的方式,以提高所建立的 QSPR 的统计质量。最佳的多变量线性回归模型是使用替换方法变量子集选择技术找到的。所提出的 QSPR 模型改进了当前数据集生物浓缩系数的先前报告模型。