• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于集成学习模型的 DFT 非共价相互作用的高效校正。

Efficient Corrections for DFT Noncovalent Interactions Based on Ensemble Learning Models.

机构信息

School of Information Science and Technology , Northeast Normal University , Changchun , 130117 , China.

Institute of Functional Material Chemistry, Faculty of Chemistry , Northeast Normal University , Changchun , 130024 , China.

出版信息

J Chem Inf Model. 2019 May 28;59(5):1849-1857. doi: 10.1021/acs.jcim.8b00878. Epub 2019 Apr 8.

DOI:10.1021/acs.jcim.8b00878
PMID:30912940
Abstract

Machine learning has exhibited powerful capabilities in many areas. However, machine learning models are mostly database dependent, requiring a new model if the database changes. Therefore, a universal model is highly desired to accommodate the widest variety of databases. Fortunately, this universality may be achieved by ensemble learning, which can integrate multiple learners to meet the demands of diversified databases. Therefore, we propose a general procedure for learning ensemble establishment based on noncovalent interactions (NCIs) databases. Additionally, accurate NCI computation is quite demanding for first-principles methods, for which a competent machine learning model can be an efficient solution to obtain high NCI accuracy with minimal computational resources. In regard to these aspects, multiple schemes of ensemble learning models (Bagging, Boosting, and Stacking frameworks), are explored in this study. The models are based on various low levels of density functional theory (DFT) calculations for the benchmark databases S66, S22, and X40. All NCIs computed by the DFT calculations can be improved to high-level accuracy (root-mean-square error RMSE = 0.22 kcal/mol in contrast to CCSD(T)/CBS benchmark) by established ensemble learning models. Compared with single machine learning models, ensemble models show better accuracy (RMSE of the best model is further lowered by ∼25%), robustness and goodness-of-fit according to evaluation parameters suggested by the OECD. Among ensemble learning models, heterogeneous Stacking ensemble models show the most valuable application potential. The standardized procedure of constructing learning ensembles has been well utilized on several NCI data sets, and this procedure may also be applicable for other chemical databases.

摘要

机器学习在许多领域都展现出了强大的能力。然而,机器学习模型大多依赖于数据库,如果数据库发生变化,就需要一个新的模型。因此,人们非常希望有一种通用模型来适应最广泛的数据库。幸运的是,这种通用性可以通过集成学习来实现,集成学习可以整合多个学习者来满足多样化数据库的需求。因此,我们提出了一种基于非共价相互作用(NCIs)数据库的学习集成建立的通用程序。此外,对于第一性原理方法来说,准确计算 NCIs 的要求非常高,对于这种情况,一个胜任的机器学习模型可以是一个高效的解决方案,可以用最小的计算资源获得高的 NCIs 精度。在这些方面,本研究探索了多种集成学习模型(Bagging、Boosting 和 Stacking 框架)的方案。这些模型基于不同的密度泛函理论(DFT)计算水平,用于基准数据库 S66、S22 和 X40。通过建立的集成学习模型,可以将 DFT 计算得到的所有 NCIs 提高到高精度(与 CCSD(T)/CBS 基准相比,均方根误差 RMSE = 0.22 kcal/mol)。与单个机器学习模型相比,集成模型显示出更好的准确性(最佳模型的 RMSE 进一步降低了约 25%)、稳健性和拟合优度,根据 OECD 建议的评估参数。在集成学习模型中,异构的 Stacking 集成模型显示出最有价值的应用潜力。构建学习集成的标准化程序已经在几个 NCI 数据集上得到了很好的利用,并且该程序也可能适用于其他化学数据库。

相似文献

1
Efficient Corrections for DFT Noncovalent Interactions Based on Ensemble Learning Models.基于集成学习模型的 DFT 非共价相互作用的高效校正。
J Chem Inf Model. 2019 May 28;59(5):1849-1857. doi: 10.1021/acs.jcim.8b00878. Epub 2019 Apr 8.
2
A machine learning correction for DFT non-covalent interactions based on the S22, S66 and X40 benchmark databases.基于S22、S66和X40基准数据库的用于密度泛函理论非共价相互作用的机器学习校正。
J Cheminform. 2016 May 3;8:24. doi: 10.1186/s13321-016-0133-7. eCollection 2016.
3
Calculations on noncovalent interactions and databases of benchmark interaction energies.非共价相互作用的计算和基准相互作用能数据库。
Acc Chem Res. 2012 Apr 17;45(4):663-72. doi: 10.1021/ar200255p. Epub 2012 Jan 6.
4
A Machine Learning Ensemble Classifier for Early Prediction of Diabetic Retinopathy.机器学习集成分类器在糖尿病视网膜病变早期预测中的应用。
J Med Syst. 2017 Nov 9;41(12):201. doi: 10.1007/s10916-017-0853-x.
5
Exploiting ensemble learning to improve prediction of phospholipidosis inducing potential.利用集成学习提高磷脂蓄积诱导潜力预测。
J Theor Biol. 2019 Oct 21;479:37-47. doi: 10.1016/j.jtbi.2019.07.009. Epub 2019 Jul 13.
6
A comparative study of heterogeneous and homogeneous ensemble approaches for landslide susceptibility assessment in the Djebahia region, Algeria.在阿尔及利亚杰巴希亚地区进行滑坡易发性评估的非均匀和均匀集成方法的比较研究。
Environ Sci Pollut Res Int. 2024 Jun;31(28):40554-40580. doi: 10.1007/s11356-023-26247-3. Epub 2023 Mar 9.
7
Forecasting Corn Yield With Machine Learning Ensembles.利用机器学习集成预测玉米产量
Front Plant Sci. 2020 Jul 31;11:1120. doi: 10.3389/fpls.2020.01120. eCollection 2020.
8
Toward a less costly but accurate calculation of the CCSD(T)/CBS noncovalent interaction energy.迈向一种成本更低但精确的耦合簇单双激发组态相互作用方法/完全基组极限下非共价相互作用能计算方法
J Comput Chem. 2020 May 15;41(13):1252-1260. doi: 10.1002/jcc.26171. Epub 2020 Feb 11.
9
Stacked Ensemble Machine Learning for Range-Separation Parameters.堆叠集成机器学习算法在距分离参数中的应用
J Phys Chem Lett. 2021 Oct 7;12(39):9516-9524. doi: 10.1021/acs.jpclett.1c02506. Epub 2021 Sep 24.
10
Optimized stacking, a new method for constructing ensemble surrogate models applied to DNAPL-contaminated aquifer remediation.优化堆叠,一种用于构建用于 DNAPL 污染含水层修复的集成替代模型的新方法。
J Contam Hydrol. 2021 Dec;243:103914. doi: 10.1016/j.jconhyd.2021.103914. Epub 2021 Oct 28.

引用本文的文献

1
Comprehensive Study of the Chemistry behind the Stability of Carboxylic SWCNT Dispersions in the Development of a Transparent Electrode.透明电极开发中羧酸单壁碳纳米管分散体稳定性背后化学原理的综合研究。
Nanomaterials (Basel). 2022 Jun 1;12(11):1901. doi: 10.3390/nano12111901.
2
A protocol for investigating lipidomic dysregulation and discovering lipid biomarkers from human serums.从人血清中探究脂质组学失调和发现脂质生物标志物的方案。
STAR Protoc. 2022 Feb 2;3(1):101125. doi: 10.1016/j.xpro.2022.101125. eCollection 2022 Mar 18.
3
Machine Learning of Serum Metabolic Patterns Encodes Asymptomatic SARS-CoV-2 Infection.
血清代谢模式的机器学习编码无症状SARS-CoV-2感染。
Front Chem. 2021 Oct 1;9:746134. doi: 10.3389/fchem.2021.746134. eCollection 2021.
4
Recent Advances in In Silico Target Fishing.计算机辅助药物靶点发现的最新进展
Molecules. 2021 Aug 24;26(17):5124. doi: 10.3390/molecules26175124.
5
Distinct lipid metabolic dysregulation in asymptomatic COVID-19.无症状新冠肺炎患者存在独特的脂质代谢失调。
iScience. 2021 Sep 24;24(9):102974. doi: 10.1016/j.isci.2021.102974. Epub 2021 Aug 11.
6
LogP prediction performance with the SMD solvation model and the M06 density functional family for SAMPL6 blind prediction challenge molecules.SMD 溶剂化模型和 M06 密度泛函家族对 SAMPL6 盲测挑战分子的 LogP 预测性能。
J Comput Aided Mol Des. 2020 May;34(5):511-522. doi: 10.1007/s10822-020-00278-1. Epub 2020 Jan 14.
7
STarFish: A Stacked Ensemble Target Fishing Approach and its Application to Natural Products.STarFish:一种堆叠集成目标捕捞方法及其在天然产物中的应用。
J Chem Inf Model. 2019 Nov 25;59(11):4906-4920. doi: 10.1021/acs.jcim.9b00489. Epub 2019 Oct 24.