Suppr超能文献

关于基于机器学习的q-RASAR方法中用于高效定量预测选定毒性终点的一些基于新颖相似性的函数。

On Some Novel Similarity-Based Functions Used in the ML-Based q-RASAR Approach for Efficient Quantitative Predictions of Selected Toxicity End Points.

作者信息

Banerjee Arkaprava, Roy Kunal

机构信息

Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700 032, India.

出版信息

Chem Res Toxicol. 2023 Mar 20;36(3):446-464. doi: 10.1021/acs.chemrestox.2c00374. Epub 2023 Feb 22.

Abstract

The novel quantitative read-across structure-activity relationship (q-RASAR) approach uses read-across-derived similarity functions in the quantitative structure-activity relationship (QSAR) modeling framework in a unique way for supervised model generation. The aim of this study is to explore how this workflow enhances the external (test set) prediction quality of conventional QSAR models by the incorporation of some novel similarity-based functions as additional descriptors using the same level of chemical information. To establish this, five different toxicity data sets, for which QSAR models were reported previously, have been considered in the q-RASAR modeling exercise, which uses chemical similarity-derived measures. The identical sets of chemical features along with the same compositions of training and test sets as reported previously were used in the present analysis for ease of comparison. The RASAR descriptors were calculated based on a chosen similarity measure with the default setting of relevant hyperparameter(s) and were then clubbed with the original structural and physicochemical descriptors, and the number of selected features was further optimized by employing a grid search technique applied on the respective training sets. These features were then used to develop multiple linear regression (MLR) q-RASAR models that show enhanced predictivity as compared to the QSAR models developed previously. Moreover, various other ML algorithms like support vector machine (SVM), linear SVM, random forest, partial least squares, and ridge regression were also employed using the same feature combinations as used in the MLR models to compare the prediction qualities. The q-RASAR models for five different data sets possess at least one of the RASAR descriptors, , and , suggesting that these are important determinants of similarities that contribute to the development of predictive q-RASAR models, as also evident from the SHAP analysis of the models.

摘要

新型定量类推结构-活性关系(q-RASAR)方法在定量结构-活性关系(QSAR)建模框架中以独特方式使用类推衍生的相似性函数来生成监督模型。本研究的目的是探索这种工作流程如何通过纳入一些基于相似性的新型函数作为额外描述符,利用相同水平的化学信息来提高传统QSAR模型的外部(测试集)预测质量。为了证实这一点,在q-RASAR建模实践中考虑了五个先前已报道QSAR模型的不同毒性数据集,该实践使用化学相似性衍生的度量。为便于比较,本分析使用了与先前报道相同的化学特征集以及相同组成的训练集和测试集。基于选定的相似性度量并使用相关超参数的默认设置计算RASAR描述符,然后将其与原始结构和物理化学描述符合并,通过在各个训练集上应用网格搜索技术进一步优化所选特征的数量。然后使用这些特征开发多元线性回归(MLR)q-RASAR模型,与先前开发的QSAR模型相比,该模型显示出更高的预测能力。此外,还使用与MLR模型相同的特征组合采用了各种其他机器学习算法,如支持向量机(SVM)、线性SVM、随机森林、偏最小二乘法和岭回归,以比较预测质量。五个不同数据集的q-RASAR模型至少拥有一个RASAR描述符, 、 和 ,这表明这些是相似性的重要决定因素,有助于开发预测性q-RASAR模型,这也从模型的SHAP分析中得到证实。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验