• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于异常值移除的多目标遗传算法。

A Multi-Objective Genetic Algorithm for Outlier Removal.

机构信息

Department of Management, Bar-Ilan University , Ramat-Gan 52900, Israel.

School of Management and Economics, The Academic College of Tel-Aviv - Yafo , Yafo 61083, Israel.

出版信息

J Chem Inf Model. 2015 Dec 28;55(12):2507-18. doi: 10.1021/acs.jcim.5b00515. Epub 2015 Nov 23.

DOI:10.1021/acs.jcim.5b00515
PMID:26553402
Abstract

Quantitative structure activity relationship (QSAR) or quantitative structure property relationship (QSPR) models are developed to correlate activities for sets of compounds with their structure-derived descriptors by means of mathematical models. The presence of outliers, namely, compounds that differ in some respect from the rest of the data set, compromise the ability of statistical methods to derive QSAR models with good prediction statistics. Hence, outliers should be removed from data sets prior to model derivation. Here we present a new multi-objective genetic algorithm for the identification and removal of outliers based on the k nearest neighbors (kNN) method. The algorithm was used to remove outliers from three different data sets of pharmaceutical interest (logBBB, factor 7 inhibitors, and dihydrofolate reductase inhibitors), and its performances were compared with those of five other methods for outlier removal. The results suggest that the new algorithm provides filtered data sets that (1) better maintain the internal diversity of the parent data sets and (2) give rise to QSAR models with much better prediction statistics. Equally good filtered data sets in terms of these metrics were obtained when another objective function was added to the algorithm (termed "preservation"), forcing it to remove certain compounds with low probability only. This option is highly useful when specific compounds should be preferably kept in the final data set either because they have favorable activities or because they represent interesting molecular scaffolds. We expect this new algorithm to be useful in future QSAR applications.

摘要

定量构效关系(QSAR)或定量构性关系(QSPR)模型是通过数学模型开发的,用于将化合物组的活性与其结构衍生描述符相关联。离群值的存在,即某些化合物在某些方面与数据集的其余部分不同,会影响统计方法得出具有良好预测统计数据的 QSAR 模型的能力。因此,在推导模型之前,应从数据集中删除离群值。在这里,我们提出了一种新的基于 k 最近邻(kNN)方法的多目标遗传算法,用于识别和删除离群值。该算法用于从三个不同的药物相关数据集(logBBB、factor 7 抑制剂和二氢叶酸还原酶抑制剂)中删除离群值,并将其性能与其他五种离群值去除方法进行比较。结果表明,新算法提供了更好地保持原始数据集内部多样性的过滤数据集,并且产生了具有更好预测统计数据的 QSAR 模型。当向算法添加另一个目标函数(称为“保留”)以强制仅删除某些概率较低的化合物时,也可以获得在这些指标方面同样出色的过滤数据集。当特定化合物由于具有有利的活性或代表有趣的分子支架而应该优选保留在最终数据集中时,此选项非常有用。我们期望这种新算法在未来的 QSAR 应用中有用。

相似文献

1
A Multi-Objective Genetic Algorithm for Outlier Removal.一种用于异常值移除的多目标遗传算法。
J Chem Inf Model. 2015 Dec 28;55(12):2507-18. doi: 10.1021/acs.jcim.5b00515. Epub 2015 Nov 23.
2
k-Nearest neighbors optimization-based outlier removal.基于k近邻优化的异常值去除
J Comput Chem. 2015 Mar 30;36(8):493-506. doi: 10.1002/jcc.23803. Epub 2014 Dec 15.
3
Optimization of molecular representativeness.分子代表性的优化
J Chem Inf Model. 2014 Jun 23;54(6):1567-77. doi: 10.1021/ci400715n. Epub 2014 May 19.
4
Combinatorial QSAR of ambergris fragrance compounds.龙涎香香料化合物的组合定量构效关系
J Chem Inf Comput Sci. 2004 Mar-Apr;44(2):582-95. doi: 10.1021/ci034203t.
5
Evaluation of QSAR Equations for Virtual Screening.QSAR 方程在虚拟筛选中的评估。
Int J Mol Sci. 2020 Oct 22;21(21):7828. doi: 10.3390/ijms21217828.
6
4D-QSAR study of HEPT derivatives by electron conformational-genetic algorithm method.HEPT 衍生物的电子构象遗传算法 4D-QSAR 研究。
SAR QSAR Environ Res. 2012 Jul;23(5-6):409-33. doi: 10.1080/1062936X.2012.665082. Epub 2012 Mar 27.
7
Regression Modelability Index: A New Index for Prediction of the Modelability of Data Sets in the Development of QSAR Regression Models.回归可模型化指数:一种新的指数,用于预测 QSAR 回归模型开发中数据集的可模型化性。
J Chem Inf Model. 2018 Oct 22;58(10):2069-2084. doi: 10.1021/acs.jcim.8b00313. Epub 2018 Sep 25.
8
Fuzzy ARTMAP prediction of biological activities for potential HIV-1 protease inhibitors using a small molecular data set.使用小分子数据集的潜在 HIV-1 蛋白酶抑制剂生物活性的模糊 ART MAP 预测。
IEEE/ACM Trans Comput Biol Bioinform. 2011 Jan-Mar;8(1):80-93. doi: 10.1109/TCBB.2009.50.
9
General Approach to Estimate Error Bars for Quantitative Structure-Activity Relationship Predictions of Molecular Activity.定量构效关系预测分子活性的误差估计的一般方法。
J Chem Inf Model. 2018 Aug 27;58(8):1561-1575. doi: 10.1021/acs.jcim.8b00114. Epub 2018 Jul 17.
10
Does rational selection of training and test sets improve the outcome of QSAR modeling?训练集和测试集的合理选择是否能提高 QSAR 建模的结果?
J Chem Inf Model. 2012 Oct 22;52(10):2570-8. doi: 10.1021/ci300338w. Epub 2012 Oct 3.

引用本文的文献

1
RANdom SAmple Consensus (RANSAC) algorithm for material-informatics: application to photovoltaic solar cells.材料信息学中的随机抽样一致性(RANSAC)算法:在光伏太阳能电池中的应用
J Cheminform. 2017 Jun 6;9(1):34. doi: 10.1186/s13321-017-0224-0.