• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

回归可模型化指数:一种新的指数,用于预测 QSAR 回归模型开发中数据集的可模型化性。

Regression Modelability Index: A New Index for Prediction of the Modelability of Data Sets in the Development of QSAR Regression Models.

机构信息

Department of Computing and Numerical Analysis , University of Córdoba , Campus de Rabanales, Albert Einstein Building, E-14071 Córdoba , Spain.

出版信息

J Chem Inf Model. 2018 Oct 22;58(10):2069-2084. doi: 10.1021/acs.jcim.8b00313. Epub 2018 Sep 25.

DOI:10.1021/acs.jcim.8b00313
PMID:30205684
Abstract

Prediction of the capability of a data set to be modeled by a statistical algorithm in the development of quantitative structure-activity relationship (QSAR) regression models is an important issue that allows researchers to avoid unnecessary tasks, wasted time, and/or the need to depurate the molecule composition of the data set in order to achieve an improvement of the model's accuracy. In this paper, we propose and formulate a new index that correlates with the performance of QSAR models. This index, the regression modelability index, requires very low computational cost and is based on the rivality between the nearest neighbors of the molecules in the data set. This rivality allows measurement of the capability of each molecule of the data set to be correctly predicted by a regression algorithm. In this study, using 40 data sets with very different characteristics regarding the number of molecules and activity values, we prove the high correlation between the proposed regression modelability index and the correlation coefficient in cross-validation ( Q), reaching r values of 0.8. In addition, we describe the ability of this index to discover the outliers detected by the regression algorithms, allowing easy data set depuration in the first stages of the construction of QSAR regression models.

摘要

预测数据集通过统计算法建模的能力是定量构效关系(QSAR)回归模型开发中的一个重要问题,它可以让研究人员避免不必要的任务、浪费时间和/或需要纯化数据集的分子组成,以提高模型的准确性。在本文中,我们提出并制定了一个与 QSAR 模型性能相关的新指标。该指标,回归模型可预测性指数,需要非常低的计算成本,并且基于数据集分子的最近邻之间的竞争。这种竞争可以衡量数据集的每个分子被回归算法正确预测的能力。在这项研究中,我们使用了 40 个具有非常不同特性的数据集,包括分子数量和活性值,证明了所提出的回归模型可预测性指数与交叉验证(Q)相关系数之间具有高度相关性,达到 r 值为 0.8。此外,我们描述了该指数发现回归算法检测到的异常值的能力,允许在构建 QSAR 回归模型的早期阶段轻松纯化数据集。

相似文献

1
Regression Modelability Index: A New Index for Prediction of the Modelability of Data Sets in the Development of QSAR Regression Models.回归可模型化指数:一种新的指数,用于预测 QSAR 回归模型开发中数据集的可模型化性。
J Chem Inf Model. 2018 Oct 22;58(10):2069-2084. doi: 10.1021/acs.jcim.8b00313. Epub 2018 Sep 25.
2
Study of Data Set Modelability: Modelability, Rivality, and Weighted Modelability Indexes.数据集可建模性研究:可建模性、竞争和加权可建模性指标。
J Chem Inf Model. 2018 Sep 24;58(9):1798-1814. doi: 10.1021/acs.jcim.8b00188. Epub 2018 Sep 5.
3
Study of the Applicability Domain of the QSAR Classification Models by Means of the Rivality and Modelability Indexes.应用域的定量构效关系分类模型的研究通过竞争和可模型化指数。
Molecules. 2018 Oct 24;23(11):2756. doi: 10.3390/molecules23112756.
4
Rivality index neighbourhood algorithm with density and distances weighted schemes for the building of robust QSAR classification models with high reliable applicability domain.基于密度和距离加权方案的竞争指数邻域算法构建具有高可靠适用性域的稳健 QSAR 分类模型。
SAR QSAR Environ Res. 2019 Aug;30(8):587-615. doi: 10.1080/1062936X.2019.1644666. Epub 2019 Aug 30.
5
Building of Robust and Interpretable QSAR Classification Models by Means of the Rivality Index.通过竞争指数构建稳健且可解释的定量构效关系分类模型。
J Chem Inf Model. 2019 Jun 24;59(6):2785-2804. doi: 10.1021/acs.jcim.9b00264. Epub 2019 May 28.
6
Kernel Target Alignment Parameter: A New Modelability Measure for Regression Tasks.核目标对准参数:回归任务的一种新的可建模性度量。
J Chem Inf Model. 2016 Jan 25;56(1):6-11. doi: 10.1021/acs.jcim.5b00539. Epub 2015 Dec 23.
7
A Multi-Objective Genetic Algorithm for Outlier Removal.一种用于异常值移除的多目标遗传算法。
J Chem Inf Model. 2015 Dec 28;55(12):2507-18. doi: 10.1021/acs.jcim.5b00515. Epub 2015 Nov 23.
8
Robust cross-validation of linear regression QSAR models.线性回归定量构效关系模型的稳健交叉验证
J Chem Inf Model. 2008 Oct;48(10):2081-94. doi: 10.1021/ci800209k. Epub 2008 Oct 1.
9
Rational selection of training and test sets for the development of validated QSAR models.为开发经过验证的定量构效关系(QSAR)模型合理选择训练集和测试集。
J Comput Aided Mol Des. 2003 Feb-Apr;17(2-4):241-53. doi: 10.1023/a:1025386326946.
10
An automated framework for QSAR model building.一种用于定量构效关系(QSAR)模型构建的自动化框架。
J Cheminform. 2018 Jan 16;10(1):1. doi: 10.1186/s13321-017-0256-5.

引用本文的文献

1
The topology of molecular representations and its influence on machine learning performance.分子表示的拓扑结构及其对机器学习性能的影响。
J Cheminform. 2025 Jul 21;17(1):109. doi: 10.1186/s13321-025-01045-w.
2
Toward Predictive Models of Biased Agonists of the Mu Opioid Receptor.迈向μ阿片受体偏向激动剂的预测模型。
Biochemistry. 2025 May 6;64(9):1943-1949. doi: 10.1021/acs.biochem.4c00885. Epub 2025 Apr 10.