Suppr超能文献

基于预测的智能训练在分类读靶结构-活性关系(c-RASAR)模型开发中的应用:新型相似系数分类错误率评估。

Prediction-Inspired Intelligent Training for the Development of Classification Read-across Structure-Activity Relationship (c-RASAR) Models for Organic Skin Sensitizers: Assessment of Classification Error Rate from Novel Similarity Coefficients.

机构信息

Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700 032, India.

出版信息

Chem Res Toxicol. 2023 Sep 18;36(9):1518-1531. doi: 10.1021/acs.chemrestox.3c00155. Epub 2023 Aug 16.

Abstract

The advancements in the field of cheminformatics have led to a reduction in animal testing to estimate the activity, property, and toxicity of query chemicals. Read-across structure-activity relationship (RASAR) is an emerging concept that utilizes various similarity functions derived from chemical information to develop highly predictive models. Unlike quantitative structure-activity relationship (QSAR) models, RASAR descriptors of a query compound are computed from its close congeners instead of the compound itself, thus targeting predictions in the model training phase. The objective of the present study is not to propose new QSAR models for skin sensitization but to demonstrate the enhancement in the quality of predictions of the skin-sensitizing potential of organic compounds by developing classification-based RASAR (c-RASAR) models. A diverse, previously curated data set was collected from the literature for which 2D descriptors were computed. The extracted essential features were then used to develop a classification-based linear discriminant analysis (LDA) QSAR model. Furthermore, from the read-across-based predictions, RASAR descriptors were calculated using the basic settings of the hyperparameters for the Laplacian Kernel-based optimum similarity measure. After feature selection, an LDA c-RASAR model was developed, which superseded the prediction quality of the LDA-QSAR model. Various other combinations of RASAR descriptors were also taken to develop additional c-RASAR models, all showing better prediction quality than the LDA QSAR model while using a lower number of descriptors. Various other machine learning c-RASAR models were also developed for comparison purposes. In this work, we have proposed and analyzed three new similarity metrics: , , and . The first one is an indicator variable used to generate a simple univariate c-RASAR model with good prediction ability, while the remaining two are similarity indices used to analyze possible activity cliffs in the training and test sets and are believed to play an important role in the modelability analysis of data sets.

摘要

化学信息学领域的进展已经使得动物测试的数量减少,以估计查询化学物质的活性、性质和毒性。读架结构-活性关系(RASAR)是一个新兴的概念,它利用从化学信息中得出的各种相似性函数来开发高度预测性的模型。与定量结构-活性关系(QSAR)模型不同,查询化合物的 RASAR 描述符是从其近亲化合物计算出来的,而不是从化合物本身计算出来的,因此针对的是模型训练阶段的预测。本研究的目的不是提出新的 QSAR 模型来预测皮肤致敏性,而是通过开发基于分类的 RASAR(c-RASAR)模型来证明提高有机化合物皮肤致敏潜力预测的质量。收集了来自文献的多样化、经过预先整理的数据,为其计算了 2D 描述符。然后,从提取的基本特征中,开发了基于分类的线性判别分析(LDA)QSAR 模型。此外,基于读架预测,使用拉普拉斯核最优相似度量的基本参数设置计算了 RASAR 描述符。在特征选择之后,开发了一个 LDA c-RASAR 模型,它超越了 LDA-QSAR 模型的预测质量。还采用了各种其他 RASAR 描述符组合来开发额外的 c-RASAR 模型,所有这些模型的预测质量都优于 LDA QSAR 模型,同时使用的描述符数量较少。还为比较目的开发了各种其他机器学习 c-RASAR 模型。在这项工作中,我们提出并分析了三种新的相似性度量: 、 和 。第一个是一个指示变量,用于生成具有良好预测能力的简单单变量 c-RASAR 模型,而其余两个是相似性指数,用于分析训练集和测试集中可能的活性悬崖,并被认为在数据集的可模型性分析中发挥重要作用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验