MIM-ML：一种基于量子化学片段的新型随机森林模型，用于准确预测核酸的核磁共振化学位移。

MIM-ML: A Novel Quantum Chemical Fragment-Based Random Forest Model for Accurate Prediction of NMR Chemical Shifts of Nucleic Acids.

作者信息

Chandy Sruthy K, Raghavachari Krishnan

机构信息

Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States.

出版信息

J Chem Theory Comput. 2023 Oct 10;19(19):6632-6642. doi: 10.1021/acs.jctc.3c00563. Epub 2023 Sep 13.

DOI:10.1021/acs.jctc.3c00563

PMID:37703522

Abstract

We developed a random forest machine learning (ML) model for the prediction of H and C NMR chemical shifts of nucleic acids. Our ML model is trained entirely on reproducing computed chemical shifts obtained previously on 10 nucleic acids using a Molecules-in-Molecules (MIM) fragment-based density functional theory (DFT) protocol including microsolvation effects. Our ML model includes structural descriptors as well as electronic descriptors from an inexpensive low-level semiempirical calculation (GFN2-xTB) and trained on a relatively small number of DFT chemical shifts (2080 H chemical shifts and 1780 C chemical shifts on the 10 nucleic acids). The ML model is then used to make chemical shift predictions on 8 new nucleic acids ranging in size from 600 to 900 atoms and compared directly to experimental data. Though no experimental data was used in the training, the performance of our model is excellent (mean absolute deviation of 0.34 ppm for H chemical shifts and 2.52 ppm for C chemical shifts for the test set), despite having some nonstandard structures. A simple analysis suggests that both structural and electronic descriptors are critical for achieving reliable predictions. This is the first attempt to combine ML from fragment-based DFT calculations to predict experimental chemical shifts accurately, making the MIM-ML model a valuable tool for NMR predictions of nucleic acids.

摘要

我们开发了一种随机森林机器学习（ML）模型，用于预测核酸的氢（H）和碳（C）核磁共振化学位移。我们的ML模型完全基于之前使用包含微溶剂化效应的分子内分子（MIM）片段基密度泛函理论（DFT）协议对10种核酸计算得到的化学位移进行训练。我们的ML模型包括结构描述符以及来自廉价的低水平半经验计算（GFN2-xTB）的电子描述符，并基于相对较少数量的DFT化学位移（10种核酸上的2080个H化学位移和1780个C化学位移）进行训练。然后，该ML模型用于对8种大小在600至900个原子之间的新核酸进行化学位移预测，并直接与实验数据进行比较。尽管在训练中未使用实验数据，但我们的模型性能优异（测试集的H化学位移平均绝对偏差为0.34 ppm，C化学位移平均绝对偏差为2.52 ppm），尽管存在一些非标准结构。简单分析表明，结构和电子描述符对于实现可靠预测都至关重要。这是首次尝试将基于片段的DFT计算的ML结合起来以准确预测实验化学位移，使得MIM-ML模型成为核酸核磁共振预测的有价值工具。

相似文献

MIM-ML: A Novel Quantum Chemical Fragment-Based Random Forest Model for Accurate Prediction of NMR Chemical Shifts of Nucleic Acids.MIM-ML：一种基于量子化学片段的新型随机森林模型，用于准确预测核酸的核磁共振化学位移。

J Chem Theory Comput. 2023 Oct 10;19(19):6632-6642. doi: 10.1021/acs.jctc.3c00563. Epub 2023 Sep 13.

Accurate and Cost-Effective NMR Chemical Shift Predictions for Nucleic Acids Using a Molecules-in-Molecules Fragmentation-Based Method.使用基于分子内分子片段化方法对核酸进行准确且经济高效的核磁共振化学位移预测。

J Chem Theory Comput. 2023 Jan 11. doi: 10.1021/acs.jctc.2c00967.

Fragment-Based Approach for the Evaluation of NMR Chemical Shifts for Large Biomolecules Incorporating the Effects of the Solvent Environment.基于片段的方法评估包含溶剂环境影响的大生物分子的 NMR 化学位移。

J Chem Theory Comput. 2017 Mar 14;13(3):1147-1158. doi: 10.1021/acs.jctc.6b00922. Epub 2017 Feb 14.

General Protocol for the Accurate Prediction of Molecular C/H NMR Chemical Shifts via Machine Learning Augmented DFT.基于机器学习增强密度泛函理论的精确预测分子 C/H NMR 化学位移的通用方案。

J Chem Inf Model. 2020 Aug 24;60(8):3746-3754. doi: 10.1021/acs.jcim.0c00388. Epub 2020 Jul 20.

Accurate and cost-effective NMR chemical shift predictions for proteins using a molecules-in-molecules fragmentation-based method.使用基于分子内碎片的方法对蛋白质进行准确且经济高效的核磁共振化学位移预测。

Phys Chem Chem Phys. 2020 Dec 16;22(47):27781-27799. doi: 10.1039/d0cp05064d.

Prediction of N chemical shifts by machine learning.基于机器学习的 N 化学位移预测。

Magn Reson Chem. 2022 Nov;60(11):1087-1092. doi: 10.1002/mrc.5208. Epub 2021 Aug 30.

Computation of CCSD(T)-Quality NMR Chemical Shifts via Δ-Machine Learning from DFT.通过从 DFT 进行 Δ-机器学习来计算 CCSD(T)-质量 NMR 化学位移。

J Chem Theory Comput. 2023 Jun 27;19(12):3601-3615. doi: 10.1021/acs.jctc.3c00165. Epub 2023 Jun 1.

Toward Accurate Predictions of Atomic Properties via Quantum Mechanics Descriptors Augmented Graph Convolutional Neural Network: Application of This Novel Approach in NMR Chemical Shifts Predictions.通过量子力学描述符增强图卷积神经网络实现对原子性质的准确预测：这种新方法在核磁共振化学位移预测中的应用。

J Phys Chem Lett. 2020 Nov 19;11(22):9812-9818. doi: 10.1021/acs.jpclett.0c02654. Epub 2020 Nov 5.

Predicting Pt NMR Chemical Shifts in Water-Soluble Inorganic/Organometallic Complexes with a Fast and Simple Protocol Combining Semiempirical Modeling and Machine Learning.采用半经验建模与机器学习相结合的快速简便方案预测水溶性无机/金属有机配合物中的 Pt NMR 化学位移。

Chemphyschem. 2023 Jun 1;24(11):e202200940. doi: 10.1002/cphc.202200940. Epub 2023 Mar 20.

Real-time prediction of H and C chemical shifts with DFT accuracy using a 3D graph neural network.使用3D图神经网络以密度泛函理论（DFT）精度实时预测H和C化学位移。

Chem Sci. 2021 Aug 9;12(36):12012-12026. doi: 10.1039/d1sc03343c. eCollection 2021 Sep 22.

引用本文的文献

Developing a Machine Learning Model for Hydrogen Bond Acceptance Based on Natural Bond Orbital Descriptors.基于自然键轨道描述符开发用于氢键接受的机器学习模型。

J Org Chem. 2025 Jul 18;90(28):9776-9788. doi: 10.1021/acs.joc.5c00724. Epub 2025 Jul 6.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

MIM-ML：一种基于量子化学片段的新型随机森林模型，用于准确预测核酸的核磁共振化学位移。

MIM-ML: A Novel Quantum Chemical Fragment-Based Random Forest Model for Accurate Prediction of NMR Chemical Shifts of Nucleic Acids.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献