Suppr超能文献

回归可模型化指数:一种新的指数,用于预测 QSAR 回归模型开发中数据集的可模型化性。

Regression Modelability Index: A New Index for Prediction of the Modelability of Data Sets in the Development of QSAR Regression Models.

机构信息

Department of Computing and Numerical Analysis , University of Córdoba , Campus de Rabanales, Albert Einstein Building, E-14071 Córdoba , Spain.

出版信息

J Chem Inf Model. 2018 Oct 22;58(10):2069-2084. doi: 10.1021/acs.jcim.8b00313. Epub 2018 Sep 25.

Abstract

Prediction of the capability of a data set to be modeled by a statistical algorithm in the development of quantitative structure-activity relationship (QSAR) regression models is an important issue that allows researchers to avoid unnecessary tasks, wasted time, and/or the need to depurate the molecule composition of the data set in order to achieve an improvement of the model's accuracy. In this paper, we propose and formulate a new index that correlates with the performance of QSAR models. This index, the regression modelability index, requires very low computational cost and is based on the rivality between the nearest neighbors of the molecules in the data set. This rivality allows measurement of the capability of each molecule of the data set to be correctly predicted by a regression algorithm. In this study, using 40 data sets with very different characteristics regarding the number of molecules and activity values, we prove the high correlation between the proposed regression modelability index and the correlation coefficient in cross-validation ( Q), reaching r values of 0.8. In addition, we describe the ability of this index to discover the outliers detected by the regression algorithms, allowing easy data set depuration in the first stages of the construction of QSAR regression models.

摘要

预测数据集通过统计算法建模的能力是定量构效关系(QSAR)回归模型开发中的一个重要问题,它可以让研究人员避免不必要的任务、浪费时间和/或需要纯化数据集的分子组成,以提高模型的准确性。在本文中,我们提出并制定了一个与 QSAR 模型性能相关的新指标。该指标,回归模型可预测性指数,需要非常低的计算成本,并且基于数据集分子的最近邻之间的竞争。这种竞争可以衡量数据集的每个分子被回归算法正确预测的能力。在这项研究中,我们使用了 40 个具有非常不同特性的数据集,包括分子数量和活性值,证明了所提出的回归模型可预测性指数与交叉验证(Q)相关系数之间具有高度相关性,达到 r 值为 0.8。此外,我们描述了该指数发现回归算法检测到的异常值的能力,允许在构建 QSAR 回归模型的早期阶段轻松纯化数据集。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验