基于支持向量回归的化合物效力预测中的系统伪像通过统计和活性景观分析揭示

Systematic artifacts in support vector regression-based compound potency prediction revealed by statistical and activity landscape analysis.

作者信息

Balfer Jenny, Bajorath Jürgen

机构信息

Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113, Bonn, Germany.

出版信息

PLoS One. 2015 Mar 5;10(3):e0119301. doi: 10.1371/journal.pone.0119301. eCollection 2015.

DOI:10.1371/journal.pone.0119301

PMID:25742011

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4350943/

Abstract

Support vector machines are a popular machine learning method for many classification tasks in biology and chemistry. In addition, the support vector regression (SVR) variant is widely used for numerical property predictions. In chemoinformatics and pharmaceutical research, SVR has become the probably most popular approach for modeling of non-linear structure-activity relationships (SARs) and predicting compound potency values. Herein, we have systematically generated and analyzed SVR prediction models for a variety of compound data sets with different SAR characteristics. Although these SVR models were accurate on the basis of global prediction statistics and not prone to overfitting, they were found to consistently mispredict highly potent compounds. Hence, in regions of local SAR discontinuity, SVR prediction models displayed clear limitations. Compared to observed activity landscapes of compound data sets, landscapes generated on the basis of SVR potency predictions were partly flattened and activity cliff information was lost. Taken together, these findings have implications for practical SVR applications. In particular, prospective SVR-based potency predictions should be considered with caution because artificially low predictions are very likely for highly potent candidate compounds, the most important prediction targets.

摘要

支持向量机是生物学和化学领域许多分类任务中常用的机器学习方法。此外，支持向量回归（SVR）变体被广泛用于数值性质预测。在化学信息学和药物研究中，SVR已成为可能是用于非线性构效关系（SAR）建模和预测化合物效价的最流行方法。在此，我们系统地生成并分析了针对具有不同SAR特征的各种化合物数据集的SVR预测模型。尽管这些SVR模型基于全局预测统计是准确的且不易过度拟合，但发现它们始终会错误预测高效能化合物。因此，在局部SAR不连续的区域，SVR预测模型显示出明显的局限性。与化合物数据集观察到的活性景观相比，基于SVR效价预测生成的景观部分变平，活性悬崖信息丢失。综上所述，这些发现对SVR的实际应用具有启示意义。特别是，基于SVR的前瞻性效价预测应谨慎考虑，因为对于高效能候选化合物（最重要的预测目标），很可能会出现人为的低预测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/067e/4350943/f1d45fdd06b7/pone.0119301.g001.jpg

相似文献

Systematic artifacts in support vector regression-based compound potency prediction revealed by statistical and activity landscape analysis.基于支持向量回归的化合物效力预测中的系统伪像通过统计和活性景观分析揭示

PLoS One. 2015 Mar 5;10(3):e0119301. doi: 10.1371/journal.pone.0119301. eCollection 2015.

Exploring Alternative Strategies for the Identification of Potent Compounds Using Support Vector Machine and Regression Modeling.利用支持向量机和回归建模探索鉴定有效化合物的替代策略。

J Chem Inf Model. 2019 Mar 25;59(3):983-992. doi: 10.1021/acs.jcim.8b00584. Epub 2018 Dec 14.

Support Vector Machine Classification and Regression Prioritize Different Structural Features for Binary Compound Activity and Potency Value Prediction.支持向量机分类和回归为二元化合物活性和效价预测对不同结构特征进行优先级排序。

ACS Omega. 2017 Oct 31;2(10):6371-6379. doi: 10.1021/acsomega.7b01079. Epub 2017 Oct 4.

Determination of Meta-Parameters for Support Vector Machine Linear Combinations.支持向量机线性组合的元参数确定

Mol Inform. 2015 Feb;34(2-3):127-33. doi: 10.1002/minf.201400163. Epub 2015 Feb 17.

Prediction of compounds in different local structure-activity relationship environments using emerging chemical patterns.利用新兴化学模式预测不同局部结构-活性关系环境中的化合物。

J Chem Inf Model. 2014 May 27;54(5):1301-10. doi: 10.1021/ci500147b. Epub 2014 May 15.

Potency-directed similarity searching using support vector machines.基于支持向量机的效价导向相似度搜索。

Chem Biol Drug Des. 2011 Jan;77(1):30-8. doi: 10.1111/j.1747-0285.2010.01059.x. Epub 2010 Nov 29.

Prediction of Activity Cliffs Using Condensed Graphs of Reaction Representations, Descriptor Recombination, Support Vector Machine Classification, and Support Vector Regression.利用反应表示的凝聚图、描述符重组、支持向量机分类和支持向量回归预测活性悬崖。

J Chem Inf Model. 2016 Sep 26;56(9):1631-40. doi: 10.1021/acs.jcim.6b00359. Epub 2016 Aug 26.

Prediction of compound potency changes in matched molecular pairs using support vector regression.基于支持向量回归预测匹配分子对中化合物效力的变化。

J Chem Inf Model. 2014 Oct 27;54(10):2654-63. doi: 10.1021/ci5003944. Epub 2014 Sep 17.

Rationalizing three-dimensional activity landscapes and the influence of molecular representations on landscape topology and the formation of activity cliffs.合理化三维活性景观以及分子表示对景观拓扑和活性悬崖形成的影响。

J Chem Inf Model. 2010 Jun 28;50(6):1021-33. doi: 10.1021/ci100091e.

Evolution of Support Vector Machine and Regression Modeling in Chemoinformatics and Drug Discovery.支持向量机和回归建模在化学生信学和药物发现中的发展演变。

J Comput Aided Mol Des. 2022 May;36(5):355-362. doi: 10.1007/s10822-022-00442-9. Epub 2022 Mar 19.

引用本文的文献

Developing an advanced prediction model for new employee turnover intention utilizing machine learning techniques.利用机器学习技术为新员工离职意向开发先进的预测模型。

Sci Rep. 2024 Jan 12;14(1):1221. doi: 10.1038/s41598-023-50593-4.

Using Machine Learning Algorithms to Pool Data from Meta-Analysis for the Prediction of Countermovement Jump Improvement.使用机器学习算法整合荟萃分析数据以预测反跳式跳高成绩的提升。

Int J Environ Res Public Health. 2023 May 19;20(10):5881. doi: 10.3390/ijerph20105881.

Large-scale evaluation of k-fold cross-validation ensembles for uncertainty estimation.用于不确定性估计的k折交叉验证集成的大规模评估。

J Cheminform. 2023 Apr 28;15(1):49. doi: 10.1186/s13321-023-00709-9.

Predicting Potent Compounds Using a Conditional Variational Autoencoder Based upon a New Structure-Potency Fingerprint.基于新型结构-效力量子指纹的条件变分自动编码器预测有效化合物

Biomolecules. 2023 Feb 18;13(2):393. doi: 10.3390/biom13020393.

Trajectory tracking of changes digital divide prediction factors in the elderly through machine learning.通过机器学习对老年人数字鸿沟预测因素变化的轨迹进行跟踪。

PLoS One. 2023 Feb 10;18(2):e0281291. doi: 10.1371/journal.pone.0281291. eCollection 2023.

Detection of outliers in high-dimensional data using -support vector regression.使用 - 支持向量回归检测高维数据中的异常值。

J Appl Stat. 2021 Apr 8;49(10):2550-2569. doi: 10.1080/02664763.2021.1911965. eCollection 2022.

Evolution of Support Vector Machine and Regression Modeling in Chemoinformatics and Drug Discovery.支持向量机和回归建模在化学生信学和药物发现中的发展演变。

J Comput Aided Mol Des. 2022 May;36(5):355-362. doi: 10.1007/s10822-022-00442-9. Epub 2022 Mar 19.

Predicting student satisfaction of emergency remote learning in higher education during COVID-19 using machine learning techniques.运用机器学习技术预测 COVID-19 期间高等教育中远程应急学习的学生满意度。

PLoS One. 2021 Apr 2;16(4):e0249423. doi: 10.1371/journal.pone.0249423. eCollection 2021.

ACS Omega. 2017 Oct 31;2(10):6371-6379. doi: 10.1021/acsomega.7b01079. Epub 2017 Oct 4.

A link prediction approach to cancer drug sensitivity prediction.一种用于癌症药物敏感性预测的链接预测方法。

BMC Syst Biol. 2017 Oct 3;11(Suppl 5):94. doi: 10.1186/s12918-017-0463-8.

本文引用的文献

QSAR modeling: where have you been? Where are you going to?定量构效关系模型：你从何处来？你将往何处去？

J Med Chem. 2014 Jun 26;57(12):4977-5010. doi: 10.1021/jm4004285. Epub 2014 Jan 6.

Recent progress in understanding activity cliffs and their utility in medicinal chemistry.理解活性悬崖及其在药物化学中的应用的最新进展。

J Med Chem. 2014 Jan 9;57(1):18-28. doi: 10.1021/jm401120g. Epub 2013 Sep 13.

Quantifying the fingerprint descriptor dependence of structure-activity relationship information on a large scale.大规模量化指纹描述符对结构-活性关系信息的依赖性。

J Chem Inf Model. 2013 Sep 23;53(9):2275-81. doi: 10.1021/ci4004078. Epub 2013 Sep 6.

Quantitative structure-activity relationship models of clinical pharmacokinetics: clearance and volume of distribution.临床药代动力学的定量构效关系模型：清除率和分布容积。

J Chem Inf Model. 2013 Apr 22;53(4):948-57. doi: 10.1021/ci400001u. Epub 2013 Mar 15.

Machine learning methods for property prediction in chemoinformatics: Quo Vadis?机器学习在化学信息学中的性质预测方法：何去何从？

J Chem Inf Model. 2012 Jun 25;52(6):1413-37. doi: 10.1021/ci200409x. Epub 2012 May 25.

Chemoinformatics: a view of the field and current trends in method development.化学生信学：领域透视及方法开发的当前趋势。

Bioorg Med Chem. 2012 Sep 15;20(18):5317-23. doi: 10.1016/j.bmc.2012.03.030. Epub 2012 Mar 23.

ChEMBL: a large-scale bioactivity database for drug discovery.ChEMBL：用于药物发现的大型生物活性数据库。

Nucleic Acids Res. 2012 Jan;40(Database issue):D1100-7. doi: 10.1093/nar/gkr777. Epub 2011 Sep 23.

Activity landscape representations for structure-activity relationship analysis.用于构效关系分析的活性景观表示。

J Med Chem. 2010 Dec 9;53(23):8209-23. doi: 10.1021/jm100933w. Epub 2010 Sep 16.

J Chem Inf Model. 2010 Jun 28;50(6):1021-33. doi: 10.1021/ci100091e.

Extended-connectivity fingerprints.扩展连接指纹。

J Chem Inf Model. 2010 May 24;50(5):742-54. doi: 10.1021/ci100050t.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于支持向量回归的化合物效力预测中的系统伪像通过统计和活性景观分析揭示

Systematic artifacts in support vector regression-based compound potency prediction revealed by statistical and activity landscape analysis.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献