• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用哈特里-福克计算数据和机器学习模型预测最高已占分子轨道-最低未占分子轨道能隙

Predicting HOMO-LUMO Gaps Using Hartree-Fock Calculated Data and Machine Learning Models.

作者信息

Hasan Md Mehedi, Tarkhaneh Omid, Bungay Sharene D, Poirier Raymond A, Islam Shahidul M

机构信息

Department of Chemistry, Delaware State University, Dover, Delaware 19901, United States.

Department of Computer Science, Memorial University of Newfoundland, St. John's, Newfoundland and Labrador A1B 3X5, Canada.

出版信息

J Chem Inf Model. 2025 Sep 22;65(18):9497-9515. doi: 10.1021/acs.jcim.5c01412. Epub 2025 Sep 10.

DOI:10.1021/acs.jcim.5c01412
PMID:40929702
Abstract

The calculation of the highest occupied molecular orbital-lowest unoccupied molecular orbital (HOMO-LUMO) gap for chemical molecules is computationally intensive using quantum mechanics (QM) methods, while experimental determination is often costly and time-consuming. Machine Learning (ML) offers a cost-effective and rapid alternative, enabling efficient predictions of HOMO-LUMO gap values across large data sets without the need for extensive QM computations or experiments. ML models facilitate the screening of diverse molecules, providing valuable insights into complex chemical spaces and integrating seamlessly into high-throughput workflows to prioritize candidates for experimental validation. In this study, we leveraged a data set of HOMO-LUMO gap values for small molecules obtained through Hartree-Fock (HF) calculations and developed ML models to predict HOMO-LUMO energy gaps for organic molecules. Molecular descriptors generated from Simplified Molecular Input Line Entry System (SMILES) representations using RDKit were used as input features to train various regression-based ML models. The data set included 46,717 small molecules with carbon chain number ranging from 1 to 8. Among the tested models, LightGBM regressor, Bidirectional LSTM, CatBoost regressor, and Multilayer Perceptron (MLP) achieved mean absolute error (MAE) values below 0.25 eV. Further improvement was achieved by creating a weighted ensemble model combining the LightGBM regressor, Bidirectional LSTM, and MLP, resulting in a prediction accuracy with an MAE of 0.1660 eV. This ensemble model outperformed others across various data sets, with the LightGBM regressor showing better performance for predicting the HOMO-LUMO gap of saturated linear molecules. SHAP analysis was conducted which identified 20 molecular descriptors critical for accurate predictions. Additionally, the models were empirically adapted to estimate experimental HOMO-LUMO gap values for both small and large molecules (up to carbon number 50), demonstrating their versatility and practical applicability.

摘要

使用量子力学(QM)方法计算化学分子的最高占据分子轨道-最低未占据分子轨道(HOMO-LUMO)能隙计算量很大,而实验测定通常成本高昂且耗时。机器学习(ML)提供了一种经济高效且快速的替代方法,能够在无需大量QM计算或实验的情况下,对大数据集中的HOMO-LUMO能隙值进行高效预测。ML模型有助于筛选各种分子,为复杂的化学空间提供有价值的见解,并无缝集成到高通量工作流程中,以确定实验验证的优先候选物。在本研究中,我们利用通过Hartree-Fock(HF)计算获得的小分子HOMO-LUMO能隙值数据集,开发了ML模型来预测有机分子的HOMO-LUMO能隙。使用RDKit从简化分子输入线性输入系统(SMILES)表示生成的分子描述符作为输入特征,来训练各种基于回归的ML模型。该数据集包括46717个碳链数从1到8的小分子。在测试的模型中,LightGBM回归器、双向长短期记忆网络(Bidirectional LSTM)、CatBoost回归器和多层感知器(MLP)的平均绝对误差(MAE)值均低于0.25电子伏特。通过创建一个结合LightGBM回归器、双向LSTM和MLP的加权集成模型,进一步提高了预测精度,得到了MAE为0.1660电子伏特的预测准确率。该集成模型在各种数据集上均优于其他模型,其中LightGBM回归器在预测饱和线性分子的HOMO-LUMO能隙方面表现更好。进行了SHAP分析,确定了20个对准确预测至关重要的分子描述符。此外,这些模型经过经验调整,可用于估计小分子和大分子(碳数高达50)的实验HOMO-LUMO能隙值,证明了它们的通用性和实际适用性。

相似文献

1
Predicting HOMO-LUMO Gaps Using Hartree-Fock Calculated Data and Machine Learning Models.使用哈特里-福克计算数据和机器学习模型预测最高已占分子轨道-最低未占分子轨道能隙
J Chem Inf Model. 2025 Sep 22;65(18):9497-9515. doi: 10.1021/acs.jcim.5c01412. Epub 2025 Sep 10.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Supervised Machine Learning Models for Predicting Sepsis-Associated Liver Injury in Patients With Sepsis: Development and Validation Study Based on a Multicenter Cohort Study.用于预测脓毒症患者脓毒症相关肝损伤的监督式机器学习模型:基于多中心队列研究的开发与验证研究
J Med Internet Res. 2025 May 26;27:e66733. doi: 10.2196/66733.
4
Development of Machine Learning-based Algorithms to Predict the 2- and 5-year Risk of TKA After Tibial Plateau Fracture Treatment.基于机器学习的算法用于预测胫骨平台骨折治疗后2年和5年全膝关节置换风险的研究进展
Clin Orthop Relat Res. 2025 Mar 12. doi: 10.1097/CORR.0000000000003442.
5
Machine learning-based identification of key biotic and abiotic drivers of mineral weathering rate in a complex enhanced weathering experiment.在一项复杂的强化风化实验中,基于机器学习识别矿物风化速率的关键生物和非生物驱动因素。
Open Res Eur. 2025 Jul 3;5:71. doi: 10.12688/openreseurope.19252.2. eCollection 2025.
6
Accelerated prediction of molecular properties for per- and polyfluoroalkyl substances using graph neural networks with adjacency-free message passing.使用无邻接消息传递的图神经网络对全氟和多氟烷基物质的分子性质进行加速预测。
Environ Pollut. 2025 Jun 30;382:126705. doi: 10.1016/j.envpol.2025.126705.
7
Optimized feature selection and advanced machine learning for stroke risk prediction in revascularized coronary artery disease patients.优化特征选择与先进机器学习用于预测冠状动脉疾病血运重建患者的卒中风险
BMC Med Inform Decis Mak. 2025 Jul 24;25(1):276. doi: 10.1186/s12911-025-03116-2.
8
Development of a machine learning model and a web application for predicting neurological outcome at hospital discharge in spinal cord injury patients.开发用于预测脊髓损伤患者出院时神经功能结局的机器学习模型和网络应用程序。
Spine J. 2025 Jan 31. doi: 10.1016/j.spinee.2025.01.005.
9
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
10
Predicting the interfacial tension of CO and NaCl aqueous solution with machine learning.利用机器学习预测一氧化碳与氯化钠水溶液的界面张力
Sci Rep. 2025 Jul 15;15(1):25471. doi: 10.1038/s41598-025-10274-w.