Suppr超能文献

用于定量构效关系的可解释符号回归模型的可推广性改进

Generalizability Improvement of Interpretable Symbolic Regression Models for Quantitative Structure-Activity Relationships.

作者信息

Shirasawa Raku, Takaki Katsushi, Miyao Tomoyuki

机构信息

Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan.

Advanced Research Laboratory, Technology Infrastructure Center, Technology Platform, Sony Group Corporation, Atsugi Tec., 4-14-1 Asahi-cho, Atsugi-shi, Kanagawa 243-0014, Japan.

出版信息

ACS Omega. 2024 Feb 16;9(8):9463-9474. doi: 10.1021/acsomega.3c09047. eCollection 2024 Feb 27.

Abstract

In the pursuit of optimal quantitative structure-activity relationship (QSAR) models, two key factors are paramount: the robustness of predictive ability and the interpretability of the model. Symbolic regression (SR) searches for the mathematical expressions that explain a training data set. Thus, the models provided by SR are globally interpretable. We previously proposed an SR method that can generate interpretable expressions by humans. This study introduces an enhanced symbolic regression method, termed filter-induced genetic programming 2 (FIGP2), as an extension of our previously proposed SR method. FIGP2 is designed to improve the generalizability of SR models and to be applicable to data sets in which cost-intensive descriptors are employed. The FIGP2 method incorporates two major improvements: a modified domain filter to eradicate diverging expressions based on optimal calculation and the introduction of a stability metric to penalize expressions that would lead to overfitting. Our retrospective comparative analysis using 12 structure-activity relationship data sets revealed that FIGP2 surpassed the previously proposed SR method and conventional modeling methods, such as support vector regression and multivariate linear regression in terms of predictive performance. Generated mathematical expressions by FIGP2 were relatively simple and not divergent in the domain of function. Taken together, FIGP2 can be used for making interpretable regression models with predictive ability.

摘要

在追求最优的定量构效关系(QSAR)模型时,有两个关键因素至关重要:预测能力的稳健性和模型的可解释性。符号回归(SR)寻找能够解释训练数据集的数学表达式。因此,SR提供的模型具有全局可解释性。我们之前提出了一种SR方法,该方法能够生成可被人类解释的表达式。本研究引入了一种增强的符号回归方法,称为过滤诱导遗传编程2(FIGP2),作为我们之前提出的SR方法的扩展。FIGP2旨在提高SR模型的泛化能力,并适用于采用成本高昂描述符的数据集。FIGP2方法包含两项主要改进:一种经过改进的域过滤器,用于基于最优计算消除发散表达式;引入一种稳定性度量,对会导致过拟合的表达式进行惩罚。我们使用12个构效关系数据集进行的回顾性比较分析表明,在预测性能方面,FIGP2超过了之前提出的SR方法以及传统建模方法,如支持向量回归和多元线性回归。FIGP2生成的数学表达式相对简单,在函数域内不会发散。综上所述,FIGP2可用于构建具有预测能力的可解释回归模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d89/10905595/5de8760ce1ed/ao3c09047_0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验