Department of Chemistry, Michigan State University, 578 S. Shaw Lane, East Lansing, Michigan 48824, United States.
Departments of Genetics and Biochemistry, Institute of Bioinformatics and Complex Carbohydrate Center, University of Georgia, 315 Riverbend Rd, Athens, Georgia 30602, United States.
Anal Chem. 2020 Aug 4;92(15):10412-10419. doi: 10.1021/acs.analchem.0c00768. Epub 2020 Jul 15.
A major challenge for metabolomic analysis is to obtain an unambiguous identification of the metabolites detected in a sample. Among metabolomics techniques, NMR spectroscopy is a sophisticated, powerful, and generally applicable spectroscopic tool that can be used to ascertain the correct structure of newly isolated biogenic molecules. However, accurate structure prediction using computational NMR techniques depends on how much of the relevant conformational space of a particular compound is considered. It is intrinsically challenging to calculate NMR chemical shifts using high-level DFT when the conformational space of a metabolite is extensive. In this work, we developed NMR chemical shift calculation protocols using a machine learning model in conjunction with standard DFT methods. The pipeline encompasses the following steps: (1) conformation generation using a force field (FF)-based method, (2) filtering the FF generated conformations using the ASE-ANI machine learning model, (3) clustering of the optimized conformations based on structural similarity to identify chemically unique conformations, (4) DFT structural optimization of the unique conformations, and (5) DFT NMR chemical shift calculation. This protocol can calculate the NMR chemical shifts of a set of molecules using any available combination of DFT theory, solvent model, and NMR-active nuclei, using both user-selected reference compounds and/or linear regression methods. Our protocol reduces the overall computational time by 2 orders of magnitude over methods that optimize the conformations using fully ab initio methods, while still producing good agreement with experimental observations. The complete protocol is designed in such a manner that makes the computation of chemical shifts tractable for a large number of conformationally flexible metabolites.
代谢组学分析的一个主要挑战是获得样品中检测到的代谢物的明确鉴定。在代谢组学技术中,NMR 光谱是一种复杂、强大且普遍适用的光谱工具,可用于确定新分离的生物分子的正确结构。然而,使用计算 NMR 技术进行准确的结构预测取决于所考虑的特定化合物的相关构象空间有多少。当代谢物的构象空间广泛时,使用高级 DFT 计算 NMR 化学位移本质上具有挑战性。在这项工作中,我们开发了使用机器学习模型结合标准 DFT 方法的 NMR 化学位移计算协议。该流水线包括以下步骤:(1)使用基于力场(FF)的方法生成构象,(2)使用 ASE-ANI 机器学习模型过滤 FF 生成的构象,(3)根据结构相似性对优化构象进行聚类,以识别化学独特的构象,(4)对独特构象进行 DFT 结构优化,(5)DFT NMR 化学位移计算。该协议可以使用任何可用的 DFT 理论、溶剂模型和 NMR 活性核组合来计算一组分子的 NMR 化学位移,既可以使用用户选择的参考化合物,也可以使用线性回归方法。与使用完全从头计算方法优化构象的方法相比,我们的协议将整体计算时间减少了 2 个数量级,同时仍与实验观察结果吻合良好。完整的协议是这样设计的,使得计算大量构象灵活的代谢物的化学位移具有可操作性。