• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

定量构效关系(QSPR)应用中集成学习模型预测性和可解释性的比较与改进

Comparison and improvement of the predictability and interpretability with ensemble learning models in QSPR applications.

作者信息

Chen Chia-Hsiu, Tanaka Kenichi, Kotera Masaaki, Funatsu Kimito

机构信息

Department of Chemical System Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan.

出版信息

J Cheminform. 2020 Mar 30;12(1):19. doi: 10.1186/s13321-020-0417-9.

DOI:10.1186/s13321-020-0417-9
PMID:33430997
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7106596/
Abstract

Ensemble learning helps improve machine learning results by combining several models and allows the production of better predictive performance compared to a single model. It also benefits and accelerates the researches in quantitative structure-activity relationship (QSAR) and quantitative structure-property relationship (QSPR). With the growing number of ensemble learning models such as random forest, the effectiveness of QSAR/QSPR will be limited by the machine's inability to interpret the predictions to researchers. In fact, many implementations of ensemble learning models are able to quantify the overall magnitude of each feature. For example, feature importance allows us to assess the relative importance of features and to interpret the predictions. However, different ensemble learning methods or implementations may lead to different feature selections for interpretation. In this paper, we compared the predictability and interpretability of four typical well-established ensemble learning models (Random forest, extreme randomized trees, adaptive boosting and gradient boosting) for regression and binary classification modeling tasks. Then, the blending methods were built by summarizing four different ensemble learning methods. The blending method led to better performance and a unification interpretation by summarizing individual predictions from different learning models. The important features of two case studies which gave us some valuable information to compound properties were discussed in detail in this report. QSPR modeling with interpretable machine learning techniques can move the chemical design forward to work more efficiently, confirm hypothesis and establish knowledge for better results.

摘要

集成学习通过组合多个模型来帮助提高机器学习的结果,并且与单个模型相比,能够产生更好的预测性能。它还对定量构效关系(QSAR)和定量构性关系(QSPR)的研究有益并能加速其发展。随着随机森林等集成学习模型数量的不断增加,QSAR/QSPR的有效性将受到机器无法向研究人员解释预测结果的限制。事实上,许多集成学习模型实现能够量化每个特征的总体重要程度。例如,特征重要性使我们能够评估特征的相对重要性并解释预测结果。然而,不同的集成学习方法或实现可能会导致用于解释的特征选择不同。在本文中,我们比较了四种典型的成熟集成学习模型(随机森林、极端随机树、自适应提升和梯度提升)在回归和二元分类建模任务中的可预测性和可解释性。然后,通过总结四种不同的集成学习方法构建了混合方法。混合方法通过总结来自不同学习模型的个体预测,实现了更好的性能和统一的解释。本报告详细讨论了两个案例研究的重要特征,这些特征为化合物性质提供了一些有价值的信息。使用可解释机器学习技术的QSPR建模可以推动化学设计更高效地进行,验证假设并建立知识以获得更好的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1291/7106596/844db8ab5e75/13321_2020_417_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1291/7106596/16f7626b52b3/13321_2020_417_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1291/7106596/4dbf09a37bc0/13321_2020_417_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1291/7106596/d5978107dec6/13321_2020_417_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1291/7106596/6f2122c48019/13321_2020_417_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1291/7106596/da29a5f3f068/13321_2020_417_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1291/7106596/e83c81ae1fb3/13321_2020_417_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1291/7106596/bea2392a3e76/13321_2020_417_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1291/7106596/844db8ab5e75/13321_2020_417_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1291/7106596/16f7626b52b3/13321_2020_417_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1291/7106596/4dbf09a37bc0/13321_2020_417_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1291/7106596/d5978107dec6/13321_2020_417_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1291/7106596/6f2122c48019/13321_2020_417_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1291/7106596/da29a5f3f068/13321_2020_417_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1291/7106596/e83c81ae1fb3/13321_2020_417_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1291/7106596/bea2392a3e76/13321_2020_417_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1291/7106596/844db8ab5e75/13321_2020_417_Fig8_HTML.jpg

相似文献

1
Comparison and improvement of the predictability and interpretability with ensemble learning models in QSPR applications.定量构效关系(QSPR)应用中集成学习模型预测性和可解释性的比较与改进
J Cheminform. 2020 Mar 30;12(1):19. doi: 10.1186/s13321-020-0417-9.
2
Random Forest Approach to QSPR Study of Fluorescence Properties Combining Quantum Chemical Descriptors and Solvent Conditions.结合量子化学描述符和溶剂条件的荧光性质定量构效关系研究的随机森林方法。
J Fluoresc. 2018 Mar;28(2):695-706. doi: 10.1007/s10895-018-2233-4. Epub 2018 Apr 22.
3
Comprehensive ensemble in QSAR prediction for drug discovery.用于药物发现的 QSAR 预测的综合集成。
BMC Bioinformatics. 2019 Oct 26;20(1):521. doi: 10.1186/s12859-019-3135-4.
4
Comparison of the Predictive Performance and Interpretability of Random Forest and Linear Models on Benchmark Data Sets.随机森林和线性模型在基准数据集上的预测性能与可解释性比较
J Chem Inf Model. 2017 Aug 28;57(8):1773-1792. doi: 10.1021/acs.jcim.6b00753. Epub 2017 Aug 2.
5
Greedy and Linear Ensembles of Machine Learning Methods Outperform Single Approaches for QSPR Regression Problems.在定量构效关系回归问题中,机器学习方法的贪婪和线性集成比单一方法表现更优。
Mol Inform. 2015 Sep;34(9):634-47. doi: 10.1002/minf.201400122. Epub 2015 Mar 25.
6
Random forest: a classification and regression tool for compound classification and QSAR modeling.随机森林:一种用于化合物分类和定量构效关系建模的分类与回归工具。
J Chem Inf Comput Sci. 2003 Nov-Dec;43(6):1947-58. doi: 10.1021/ci034160g.
7
Ensemble modeling with machine learning and deep learning to provide interpretable generalized rules for classifying CNS drugs with high prediction power.采用机器学习和深度学习的集成建模,为具有高预测能力的 CNS 药物分类提供可解释的通用规则。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab377.
8
An ensemble model of QSAR tools for regulatory risk assessment.用于监管风险评估的QSAR工具集成模型。
J Cheminform. 2016 Sep 22;8:48. doi: 10.1186/s13321-016-0164-0. eCollection 2016.
9
Random generalized linear model: a highly accurate and interpretable ensemble predictor.随机广义线性模型:一种高度准确且可解释的集成预测器。
BMC Bioinformatics. 2013 Jan 16;14:5. doi: 10.1186/1471-2105-14-5.
10
Contemporary QSAR classifiers compared.当代定量构效关系分类器比较。
J Chem Inf Model. 2007 Jan-Feb;47(1):219-27. doi: 10.1021/ci600332j.

引用本文的文献

1
A Machine Learning Approach for Predicting the Pure-Component Surface Tension of Atmospherically Relevant Organic Compounds.一种用于预测大气相关有机化合物纯组分表面张力的机器学习方法。
ACS EST Air. 2025 Apr 8;2(5):808-823. doi: 10.1021/acsestair.4c00291. eCollection 2025 May 9.
2
Enhancing fever of unknown origin diagnosis: machine learning approaches to predict metagenomic next-generation sequencing positivity.提高不明原因发热的诊断水平:采用机器学习方法预测宏基因组下一代测序阳性结果
Front Cell Infect Microbiol. 2025 Apr 15;15:1550933. doi: 10.3389/fcimb.2025.1550933. eCollection 2025.
3
Research on the optimization model of anti-breast cancer candidate drugs based on machine learning.

本文引用的文献

1
Random Forest Model with Combined Features: A Practical Approach to Predict Liquid-crystalline Property.随机森林模型与组合特征:预测液晶性能的实用方法。
Mol Inform. 2019 Apr;38(4):e1800095. doi: 10.1002/minf.201800095. Epub 2018 Dec 7.
2
Random Forest Approach to QSPR Study of Fluorescence Properties Combining Quantum Chemical Descriptors and Solvent Conditions.结合量子化学描述符和溶剂条件的荧光性质定量构效关系研究的随机森林方法。
J Fluoresc. 2018 Mar;28(2):695-706. doi: 10.1007/s10895-018-2233-4. Epub 2018 Apr 22.
3
Comparison of the Predictive Performance and Interpretability of Random Forest and Linear Models on Benchmark Data Sets.
基于机器学习的抗乳腺癌候选药物优化模型研究
Front Genet. 2025 Apr 10;16:1523015. doi: 10.3389/fgene.2025.1523015. eCollection 2025.
4
Predicting the academic achievement of students using black hole optimization and Gaussian process regression.利用黑洞优化算法和高斯过程回归预测学生的学业成绩。
Sci Rep. 2025 Mar 28;15(1):10809. doi: 10.1038/s41598-025-86261-y.
5
Predictive model for abdominal liposuction volume in patients with obesity using machine learning in a longitudinal multi-center study in Korea.使用机器学习对韩国一项纵向多中心肥胖患者腹部吸脂量的预测模型研究。
Sci Rep. 2024 Nov 30;14(1):29791. doi: 10.1038/s41598-024-79654-y.
6
Application of hybridized ensemble learning and equilibrium optimization in estimating damping ratios of municipal solid waste.混合集成学习与均衡优化在城市固体废弃物阻尼比估算中的应用
Sci Rep. 2024 Jul 30;14(1):17584. doi: 10.1038/s41598-024-67381-3.
7
Mining Bovine Milk Proteins for DPP-4 Inhibitory Peptides Using Machine Learning and Virtual Proteolysis.利用机器学习和虚拟蛋白酶解挖掘牛乳蛋白质中的二肽基肽酶-4抑制肽
Research (Wash D C). 2024 Jun 17;7:0391. doi: 10.34133/research.0391. eCollection 2024.
8
Designing Sustainable Hydrophilic Interfaces via Feature Selection from Molecular Descriptors and Time-Domain Nuclear Magnetic Resonance Relaxation Curves.通过从分子描述符和时域核磁共振弛豫曲线中进行特征选择来设计可持续的亲水性界面。
Polymers (Basel). 2024 Mar 15;16(6):824. doi: 10.3390/polym16060824.
9
Research on predicting the driving forces of digital transformation in Chinese media companies based on machine learning.基于机器学习预测中国媒体公司数字转型驱动力的研究
Sci Rep. 2024 Mar 27;14(1):7286. doi: 10.1038/s41598-024-57873-7.
10
Beyond Amyloid: A Machine Learning-Driven Approach Reveals Properties of Potent GSK-3β Inhibitors Targeting Neurofibrillary Tangles.超越淀粉样蛋白:一种机器学习驱动的方法揭示了针对神经纤维缠结的强效 GSK-3β 抑制剂的特性。
Int J Mol Sci. 2024 Feb 24;25(5):2646. doi: 10.3390/ijms25052646.
随机森林和线性模型在基准数据集上的预测性能与可解释性比较
J Chem Inf Model. 2017 Aug 28;57(8):1773-1792. doi: 10.1021/acs.jcim.6b00753. Epub 2017 Aug 2.
4
Interpretable Decision Sets: A Joint Framework for Description and Prediction.可解释决策集:用于描述与预测的联合框架
KDD. 2016 Aug;2016:1675-1684. doi: 10.1145/2939672.2939874.
5
Machine-learning-assisted materials discovery using failed experiments.基于失败实验的机器学习辅助材料发现。
Nature. 2016 May 5;533(7601):73-6. doi: 10.1038/nature17439.
6
Machine learning methods in chemoinformatics.化学信息学中的机器学习方法。
Wiley Interdiscip Rev Comput Mol Sci. 2014 Sep 1;4(5):468-481. doi: 10.1002/wcms.1183.
7
Random forests for feature selection in QSPR Models - an application for predicting standard enthalpy of formation of hydrocarbons.随机森林在 QSPR 模型中的特征选择 - 预测碳氢化合物标准生成焓的应用。
J Cheminform. 2013 Feb 11;5(1):9. doi: 10.1186/1758-2946-5-9.
8
Per aspera ad astra: application of Simplex QSAR approach in antiviral research.循此苦旅,以达星辰:单纯形 QSAR 方法在抗病毒研究中的应用。
Future Med Chem. 2010 Jul;2(7):1205-26. doi: 10.4155/fmc.10.194.
9
What is solvatochromism?什么是溶剂变色现象?
J Phys Chem B. 2010 Dec 30;114(51):17128-35. doi: 10.1021/jp1097487. Epub 2010 Dec 3.
10
Application of random forest approach to QSAR prediction of aquatic toxicity.随机森林方法在定量结构-活性关系预测水生毒性中的应用。
J Chem Inf Model. 2009 Nov;49(11):2481-8. doi: 10.1021/ci900203n.