Gajewicz-Skretna Agnieszka, Kar Supratik, Piotrowska Magdalena, Leszczynski Jerzy
Laboratory of Environmental Chemometrics, Faculty of Chemistry, University of Gdansk, Wita Stwosza 63, 80-308, Gdansk, Poland.
Interdisciplinary Center for Nanotoxicity, Department of Chemistry, Physics and Atmospheric Sciences, Jackson State University, 1400 J. R. Lynch Street, P. O. Box 17910, Jackson, MS, 39217, USA.
J Cheminform. 2021 Feb 12;13(1):9. doi: 10.1186/s13321-021-00484-5.
The ability of accurate predictions of biological response (biological activity/property/toxicity) of a given chemical makes the quantitative structure-activity/property/toxicity relationship (QSAR/QSPR/QSTR) models unique among the in silico tools. In addition, experimental data of selected species can also be used as an independent variable along with other structural as well as physicochemical variables to predict the response for different species formulating quantitative activity-activity relationship (QAAR)/quantitative structure-activity-activity relationship (QSAAR) approach. Irrespective of the models' type, the developed model's quality, and reliability need to be checked through multiple classical stringent validation metrics. Among the validation metrics, error-based metrics are more significant as the basic idea of a good predictive model is to improve the predictions' quality by lowering the predicted residuals for new query compounds. Following the concept, we have checked the predictive quality of the QSAR and QSAAR models employing kernel-weighted local polynomial regression (KwLPR) approach over the traditional linear and non-linear regression-based approaches tools such as multiple linear regression (MLR) and k nearest neighbors (kNN). Five datasets which were previously modeled using linear and non-linear regression method were considered to implement the KwPLR approach, followed by comparison of their validation metrics outcomes. For all five cases, the KwLPR based models reported better results over the traditional approaches. The present study's focus is not to develop a better or improved QSAR/QSAAR model over the previous ones, but to demonstrate the advantage, prediction power, and reliability of the KwLPR algorithm and establishing it as a novel, powerful cheminformatic tool. To facilitate the use of the KwLPR algorithm for QSAR/QSPR/QSTR/QSAAR modeling, the authors provide an in-house developed KwLPR.RMD script under the open-source R programming language.
对给定化学物质的生物反应(生物活性/性质/毒性)进行准确预测的能力,使得定量构效/构性/构毒关系(QSAR/QSPR/QSTR)模型在计算机模拟工具中独树一帜。此外,所选物种的实验数据也可与其他结构及物理化学变量一起用作自变量,以预测不同物种的反应,从而形成定量活性-活性关系(QAAR)/定量构效-活性关系(QSAAR)方法。无论模型类型如何,所开发模型的质量和可靠性都需要通过多个经典的严格验证指标进行检验。在验证指标中,基于误差的指标更为重要,因为一个好的预测模型的基本思想是通过降低新查询化合物的预测残差来提高预测质量。遵循这一概念,我们采用核加权局部多项式回归(KwLPR)方法,对比传统的基于线性和非线性回归的方法工具(如多元线性回归(MLR)和k近邻(kNN)),检验了QSAR和QSAAR模型的预测质量。我们考虑了五个先前使用线性和非线性回归方法建模的数据集,以实施KwPLR方法,随后比较它们的验证指标结果。对于所有五个案例,基于KwLPR的模型比传统方法报告了更好的结果。本研究的重点不是开发一个比以前更好或改进的QSAR/QSAAR模型,而是展示KwLPR算法的优势、预测能力和可靠性,并将其确立为一种新颖、强大的化学信息学工具。为便于将KwLPR算法用于QSAR/QSPR/QSTR/QSAAR建模,作者在开源的R编程语言下提供了一个内部开发的KwLPR.RMD脚本。