Suppr超能文献

数据集可建模性研究:可建模性、竞争和加权可建模性指标。

Study of Data Set Modelability: Modelability, Rivality, and Weighted Modelability Indexes.

机构信息

Department of Computing and Numerical Analysis , University of Córdoba , Campus de Rabanales , Albert Einstein Building, E-14071 Córdoba , Spain.

出版信息

J Chem Inf Model. 2018 Sep 24;58(9):1798-1814. doi: 10.1021/acs.jcim.8b00188. Epub 2018 Sep 5.

Abstract

The knowledge of the capacity of a data set to be modeled in the first stages of the building of quantitative structure-activity relationship (QSAR) prediction models is an important issue because it might reduce the effort and time necessary to select or reject data sets and in refining the data set's composition. The modelability index (MODI) is based on the counting of the first nearest neighbor belonging to the molecules of the data set and is a standardized measurement assumed in the QSAR community. In this paper, we revisit the calculation of the modelability index, proposing a more formal formulation that extends the calculation to the first nearest neighbors that belong to each existing class in the data set. In addition, this new formulation allows the calculation of the rivality index, as a measurement of the presence of correctly classifiable molecules and activity cliffs. By weighting the rivality index considering the cardinality of the neighborhood of each molecule of the data set, the calculated weighted modelability index is highly correlated with the correct classification rate (QSAR_CCR) obtained in the building of QSAR models using different classification algorithms. The results obtained with the weighted modelability index show correlations of r higher than 0.9, slopes close to 1, and bias close to zero for different algorithms.

摘要

数据集在定量构效关系(QSAR)预测模型构建的早期阶段进行建模的能力的知识是一个重要的问题,因为它可能减少选择或拒绝数据集以及细化数据集组成所需的工作量和时间。可建模性指数(MODI)基于数据集分子的第一近邻的计数,并且是 QSAR 社区中采用的标准化度量。在本文中,我们重新研究了可建模性指数的计算,提出了一种更正式的公式,将计算扩展到属于数据集中每个现有类的第一近邻。此外,这种新的公式允许计算竞争指数,作为存在正确可分类分子和活性悬崖的度量。通过考虑数据集的每个分子的邻域的基数来加权竞争指数,计算的加权可建模性指数与使用不同分类算法构建 QSAR 模型时获得的正确分类率(QSAR_CCR)高度相关。使用加权可建模性指数获得的结果显示,对于不同的算法,r 的相关性高于 0.9,斜率接近 1,偏差接近零。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验