Gramatica Paola, Cassani Stefano, Roy Partha Pratim, Kovarich Simona, Yap Chun Wei, Papa Ester
QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Via Dunant 3, 21100, Varese, Italy, http://www.qsar.it.
Present Address: Guru Ghasidas University, Bilaspur, Koni, India.
Mol Inform. 2012 Dec;31(11-12):817-35. doi: 10.1002/minf.201200075. Epub 2012 Nov 19.
A case study of toxicity of (benzo)triazoles ((B)TAZs) to the algae Pseudokirchneriella subcapitata is used to discuss some problems and solutions in QSAR modeling, particularly in the environmental context. The relevance of data curation (not only of experimental data, but also of chemical structures and input formats for the calculation of molecular descriptors), the crucial points of QSAR model validation and the potential application for new chemicals (internal robustness, exclusion of chance correlation, external predictivity, applicability domain) are described, while developing MLR-OLS models based on molecular descriptors, calculated by various QSAR software tools (commercial DRAGON, free PaDEL-Descriptor and QSPR-THESAURUS). Additionally, the utility of consensus models is highlighted. This work summarizes a methodology for a rigorous statistical approach to obtain reliable QSAR predictions, also for a large number of (B)TAZs in the ECHA preregistration list of REACH (even if starting from limited experimental data availability), and has evidenced some ambiguities and discrepancies related to SMILES notations from different databases; furthermore it highlighted some general problems related to QSAR model generation and was useful in the implementation of the PaDEL-Descriptor software.
以(苯并)三唑类((B)TAZs)对藻类斜生栅藻的毒性案例研究,来讨论定量构效关系(QSAR)建模中的一些问题及解决方案,特别是在环境背景下。描述了数据整理的相关性(不仅包括实验数据,还包括化学结构和用于计算分子描述符的输入格式)、QSAR模型验证的关键点以及新化学品的潜在应用(内部稳健性、排除偶然相关性、外部预测性、适用域),同时基于各种QSAR软件工具(商业软件DRAGON、免费软件PaDEL-Descriptor和QSPR-THESAURUS)计算得到的分子描述符开发多元线性回归-普通最小二乘法(MLR-OLS)模型。此外,还强调了共识模型的实用性。这项工作总结了一种严谨的统计方法,以获得可靠的QSAR预测,即使对于REACH法规ECHA预注册清单中的大量(B)TAZs(即使从有限的实验数据可用性开始)也是如此,并且已经证明了与来自不同数据库的SMILES符号相关的一些模糊性和差异;此外,它还突出了与QSAR模型生成相关的一些普遍问题,并且对PaDEL-Descriptor软件的实施很有用。