Suppr超能文献

MIM-ML:一种基于量子化学片段的新型随机森林模型,用于准确预测核酸的核磁共振化学位移。

MIM-ML: A Novel Quantum Chemical Fragment-Based Random Forest Model for Accurate Prediction of NMR Chemical Shifts of Nucleic Acids.

作者信息

Chandy Sruthy K, Raghavachari Krishnan

机构信息

Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States.

出版信息

J Chem Theory Comput. 2023 Oct 10;19(19):6632-6642. doi: 10.1021/acs.jctc.3c00563. Epub 2023 Sep 13.

Abstract

We developed a random forest machine learning (ML) model for the prediction of H and C NMR chemical shifts of nucleic acids. Our ML model is trained entirely on reproducing computed chemical shifts obtained previously on 10 nucleic acids using a Molecules-in-Molecules (MIM) fragment-based density functional theory (DFT) protocol including microsolvation effects. Our ML model includes structural descriptors as well as electronic descriptors from an inexpensive low-level semiempirical calculation (GFN2-xTB) and trained on a relatively small number of DFT chemical shifts (2080 H chemical shifts and 1780 C chemical shifts on the 10 nucleic acids). The ML model is then used to make chemical shift predictions on 8 new nucleic acids ranging in size from 600 to 900 atoms and compared directly to experimental data. Though no experimental data was used in the training, the performance of our model is excellent (mean absolute deviation of 0.34 ppm for H chemical shifts and 2.52 ppm for C chemical shifts for the test set), despite having some nonstandard structures. A simple analysis suggests that both structural and electronic descriptors are critical for achieving reliable predictions. This is the first attempt to combine ML from fragment-based DFT calculations to predict experimental chemical shifts accurately, making the MIM-ML model a valuable tool for NMR predictions of nucleic acids.

摘要

我们开发了一种随机森林机器学习(ML)模型,用于预测核酸的氢(H)和碳(C)核磁共振化学位移。我们的ML模型完全基于之前使用包含微溶剂化效应的分子内分子(MIM)片段基密度泛函理论(DFT)协议对10种核酸计算得到的化学位移进行训练。我们的ML模型包括结构描述符以及来自廉价的低水平半经验计算(GFN2-xTB)的电子描述符,并基于相对较少数量的DFT化学位移(10种核酸上的2080个H化学位移和1780个C化学位移)进行训练。然后,该ML模型用于对8种大小在600至900个原子之间的新核酸进行化学位移预测,并直接与实验数据进行比较。尽管在训练中未使用实验数据,但我们的模型性能优异(测试集的H化学位移平均绝对偏差为0.34 ppm,C化学位移平均绝对偏差为2.52 ppm),尽管存在一些非标准结构。简单分析表明,结构和电子描述符对于实现可靠预测都至关重要。这是首次尝试将基于片段的DFT计算的ML结合起来以准确预测实验化学位移,使得MIM-ML模型成为核酸核磁共振预测的有价值工具。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验