Banerjee Arkaprava, Roy Kunal
Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700 032, India.
Chem Res Toxicol. 2023 Mar 20;36(3):446-464. doi: 10.1021/acs.chemrestox.2c00374. Epub 2023 Feb 22.
The novel quantitative read-across structure-activity relationship (q-RASAR) approach uses read-across-derived similarity functions in the quantitative structure-activity relationship (QSAR) modeling framework in a unique way for supervised model generation. The aim of this study is to explore how this workflow enhances the external (test set) prediction quality of conventional QSAR models by the incorporation of some novel similarity-based functions as additional descriptors using the same level of chemical information. To establish this, five different toxicity data sets, for which QSAR models were reported previously, have been considered in the q-RASAR modeling exercise, which uses chemical similarity-derived measures. The identical sets of chemical features along with the same compositions of training and test sets as reported previously were used in the present analysis for ease of comparison. The RASAR descriptors were calculated based on a chosen similarity measure with the default setting of relevant hyperparameter(s) and were then clubbed with the original structural and physicochemical descriptors, and the number of selected features was further optimized by employing a grid search technique applied on the respective training sets. These features were then used to develop multiple linear regression (MLR) q-RASAR models that show enhanced predictivity as compared to the QSAR models developed previously. Moreover, various other ML algorithms like support vector machine (SVM), linear SVM, random forest, partial least squares, and ridge regression were also employed using the same feature combinations as used in the MLR models to compare the prediction qualities. The q-RASAR models for five different data sets possess at least one of the RASAR descriptors, , and , suggesting that these are important determinants of similarities that contribute to the development of predictive q-RASAR models, as also evident from the SHAP analysis of the models.
新型定量类推结构-活性关系(q-RASAR)方法在定量结构-活性关系(QSAR)建模框架中以独特方式使用类推衍生的相似性函数来生成监督模型。本研究的目的是探索这种工作流程如何通过纳入一些基于相似性的新型函数作为额外描述符,利用相同水平的化学信息来提高传统QSAR模型的外部(测试集)预测质量。为了证实这一点,在q-RASAR建模实践中考虑了五个先前已报道QSAR模型的不同毒性数据集,该实践使用化学相似性衍生的度量。为便于比较,本分析使用了与先前报道相同的化学特征集以及相同组成的训练集和测试集。基于选定的相似性度量并使用相关超参数的默认设置计算RASAR描述符,然后将其与原始结构和物理化学描述符合并,通过在各个训练集上应用网格搜索技术进一步优化所选特征的数量。然后使用这些特征开发多元线性回归(MLR)q-RASAR模型,与先前开发的QSAR模型相比,该模型显示出更高的预测能力。此外,还使用与MLR模型相同的特征组合采用了各种其他机器学习算法,如支持向量机(SVM)、线性SVM、随机森林、偏最小二乘法和岭回归,以比较预测质量。五个不同数据集的q-RASAR模型至少拥有一个RASAR描述符, 、 和 ,这表明这些是相似性的重要决定因素,有助于开发预测性q-RASAR模型,这也从模型的SHAP分析中得到证实。