Suppr超能文献

基于密度和距离加权方案的竞争指数邻域算法构建具有高可靠适用性域的稳健 QSAR 分类模型。

Rivality index neighbourhood algorithm with density and distances weighted schemes for the building of robust QSAR classification models with high reliable applicability domain.

机构信息

Department of Computing and Numerical Analysis, Campus de Rabanales, University of Córdoba , Córdoba , Spain.

出版信息

SAR QSAR Environ Res. 2019 Aug;30(8):587-615. doi: 10.1080/1062936X.2019.1644666. Epub 2019 Aug 30.

Abstract

The rivality index () is a normalized distance measurement between a molecule and their first nearest neighbours providing a robust prediction of the activity of a molecule based on the known activity of their nearest neighbours. Negative values of the RI describe molecules that would be correctly classified by a statistic algorithm and, vice versa, positive values of this index describe those molecules detected as outliers by the classification algorithms. In this paper, we have described a classification algorithm based on the and we have proposed four weighted schemes (kernels) for its calculation based on the measuring of different characteristics of the neighbourhood of molecules for each molecule of the dataset at established values of the threshold of neighbours. The results obtained have demonstrated that the proposed classification algorithm, based on the , generates more reliable and robust classification models than many of the more used and well-known machine learning algorithms. These results have been validated and corroborated by using 20 balanced and unbalanced benchmark datasets of different sizes and modelability. The classification models generated provide valuable information about the molecules of the dataset, the applicability domain of the models and the reliability of the predictions.

摘要

竞争指数()是一种分子与其第一近邻之间的归一化距离度量,它基于近邻的已知活性对分子的活性进行了稳健的预测。RI 的负值描述了那些将被统计算法正确分类的分子,反之,该指数的正值描述了那些被分类算法检测为异常值的分子。在本文中,我们描述了一种基于的分类算法,并提出了四种基于邻域不同特征的加权方案(核)来计算,对于数据集的每个分子,在设定的邻域阈值下,针对该邻域的各个分子进行计算。所得到的结果表明,基于的分类算法生成的分类模型比许多更常用和更知名的机器学习算法更可靠和稳健。这些结果通过使用不同大小和可建模性的 20 个平衡和不平衡基准数据集进行了验证和证实。生成的分类模型提供了有关数据集分子、模型适用性域和预测可靠性的有价值的信息。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验