Suppr超能文献

定量构效关系(QSPR)应用中集成学习模型预测性和可解释性的比较与改进

Comparison and improvement of the predictability and interpretability with ensemble learning models in QSPR applications.

作者信息

Chen Chia-Hsiu, Tanaka Kenichi, Kotera Masaaki, Funatsu Kimito

机构信息

Department of Chemical System Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan.

出版信息

J Cheminform. 2020 Mar 30;12(1):19. doi: 10.1186/s13321-020-0417-9.

Abstract

Ensemble learning helps improve machine learning results by combining several models and allows the production of better predictive performance compared to a single model. It also benefits and accelerates the researches in quantitative structure-activity relationship (QSAR) and quantitative structure-property relationship (QSPR). With the growing number of ensemble learning models such as random forest, the effectiveness of QSAR/QSPR will be limited by the machine's inability to interpret the predictions to researchers. In fact, many implementations of ensemble learning models are able to quantify the overall magnitude of each feature. For example, feature importance allows us to assess the relative importance of features and to interpret the predictions. However, different ensemble learning methods or implementations may lead to different feature selections for interpretation. In this paper, we compared the predictability and interpretability of four typical well-established ensemble learning models (Random forest, extreme randomized trees, adaptive boosting and gradient boosting) for regression and binary classification modeling tasks. Then, the blending methods were built by summarizing four different ensemble learning methods. The blending method led to better performance and a unification interpretation by summarizing individual predictions from different learning models. The important features of two case studies which gave us some valuable information to compound properties were discussed in detail in this report. QSPR modeling with interpretable machine learning techniques can move the chemical design forward to work more efficiently, confirm hypothesis and establish knowledge for better results.

摘要

集成学习通过组合多个模型来帮助提高机器学习的结果,并且与单个模型相比,能够产生更好的预测性能。它还对定量构效关系(QSAR)和定量构性关系(QSPR)的研究有益并能加速其发展。随着随机森林等集成学习模型数量的不断增加,QSAR/QSPR的有效性将受到机器无法向研究人员解释预测结果的限制。事实上,许多集成学习模型实现能够量化每个特征的总体重要程度。例如,特征重要性使我们能够评估特征的相对重要性并解释预测结果。然而,不同的集成学习方法或实现可能会导致用于解释的特征选择不同。在本文中,我们比较了四种典型的成熟集成学习模型(随机森林、极端随机树、自适应提升和梯度提升)在回归和二元分类建模任务中的可预测性和可解释性。然后,通过总结四种不同的集成学习方法构建了混合方法。混合方法通过总结来自不同学习模型的个体预测,实现了更好的性能和统一的解释。本报告详细讨论了两个案例研究的重要特征,这些特征为化合物性质提供了一些有价值的信息。使用可解释机器学习技术的QSPR建模可以推动化学设计更高效地进行,验证假设并建立知识以获得更好的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1291/7106596/16f7626b52b3/13321_2020_417_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验