Melo Diego Ulysses, Carneiro Leonardo Martins, Coutinho-Neto Mauricio Domingues, Homem-de-Mello Paula, Bartoloni Fernando Heering
Centro de Ciências Naturais e Humanas, Universidade Federal do ABC, Santo André, São Paulo 09210-580, Brazil.
J Org Chem. 2025 Jul 18;90(28):9776-9788. doi: 10.1021/acs.joc.5c00724. Epub 2025 Jul 6.
This study employs machine learning (ML) to assess the predictive power of electronic descriptors derived from natural bond orbital (NBO) analysis for hydrogen bond acceptance. Using a data set of 979 hydrogen bond complexes, each formed by a hydrogen bond acceptor and 4-fluorophenol as the donor, we optimized geometries via GFN2-xTB, followed by DFT single-point calculations. From these, NBO analysis was used to extract intramolecular donor-acceptor interactions, particularly the orbital stabilization energies (), which reflect electron delocalization and relate to canonical resonance structures. The values served as features to train seven ML models, based on different techniques: KNN, Decision Tree, SVM, RF, MLP, XGBoost, and CatBoost. To our knowledge, this is the first work that uses as a standalone ML descriptor for hydrogen bond acceptance. Even with a small set of descriptors, we achieved high predictive performance, with errors below 0.4 kcal mol, surpassing previous studies that used heterogeneous descriptors, including quantum-chemical data. Our results highlight the utility of NBO-based features in building accurate, physically meaningful, and generalizable ML models for p prediction.
本研究采用机器学习(ML)来评估从自然键轨道(NBO)分析得出的电子描述符对氢键接受的预测能力。使用由979个氢键复合物组成的数据集,每个复合物由一个氢键受体和4-氟苯酚作为供体形成,我们通过GFN2-xTB优化几何结构,随后进行密度泛函理论(DFT)单点计算。由此,利用NBO分析提取分子内供体-受体相互作用,特别是轨道稳定能(),其反映电子离域并与典型共振结构相关。这些值用作特征来训练基于不同技术的七个ML模型:K近邻(KNN)、决策树、支持向量机(SVM)、随机森林(RF)、多层感知器(MLP)、极端梯度提升(XGBoost)和类别提升(CatBoost)。据我们所知,这是第一项将用作氢键接受的独立ML描述符的工作。即使使用少量描述符,我们也实现了高预测性能,误差低于0.4千卡/摩尔,超过了之前使用包括量子化学数据在内的异构描述符的研究。我们的结果突出了基于NBO的特征在构建用于预测的准确、具有物理意义且可推广的ML模型中的效用。