Suppr超能文献

基于聚焦库的 C NMR 光谱和结构的跨模态检索。

Cross-Modal Retrieval Between C NMR Spectra and Structures Based on Focused Libraries.

机构信息

State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, PR China.

Beijing Key Laboratory of Active Substances Discovery and Druggability Evaluation, Department of Medicinal Chemistry, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, PR China.

出版信息

Anal Chem. 2024 Apr 16;96(15):5763-5770. doi: 10.1021/acs.analchem.3c04294. Epub 2024 Apr 2.

Abstract

Library matching by comparing carbon-13 nuclear magnetic resonance (C NMR) spectra with spectral data in the library is a crucial method for compound identification. In our previous paper, we introduced a deep contrastive learning system called CReSS, which used a library that contained more structures. However, CReSS has two limitations: there were no unknown structures in the library, and a redundant library reduces the structure-elucidation accuracy. Herein, we replaced the oversize traditional libraries with focused libraries containing a small number of molecules. A previously generative model, CMGNet, was used to generate focused libraries for CReSS. The combined model achieved a Top-10 accuracy of 54.03% when tested on 6,471 C NMR spectra. In comparison, CReSS with a random reference structure library achieved an accuracy of only 9.17%. Furthermore, to expand the advantages of the focused libraries, we proposed SAmpRNN, which is a recurrent neural network (RNN). With the large focused library amplified by SAmpRNN, the structure-identification accuracy of the model increased in 70.0% of the 30 random example cases. In general, cross-modal retrieval between C NMR spectra and structures based on focused libraries (CFLS) achieved high accuracy and provided more accurate candidate structures than traditional libraries for compound identification.

摘要

通过将碳-13 核磁共振(C NMR)光谱与库中的光谱数据进行比较来进行库匹配是化合物鉴定的关键方法。在我们之前的论文中,我们介绍了一种名为 CReSS 的深度对比学习系统,该系统使用了包含更多结构的库。然而,CReSS 有两个限制:库中没有未知结构,并且冗余库会降低结构阐明的准确性。在此,我们用包含少量分子的聚焦库代替了过大的传统库。我们使用先前的生成模型 CMGNet 为 CReSS 生成聚焦库。在对 6471 个 C NMR 光谱进行测试时,组合模型的 Top-10 准确率达到了 54.03%。相比之下,使用随机参考结构库的 CReSS 的准确率仅为 9.17%。此外,为了扩大聚焦库的优势,我们提出了 SAmpRNN,这是一种递归神经网络(RNN)。通过由 SAmpRNN 放大的大型聚焦库,模型在 30 个随机示例案例中的 70.0%中提高了结构识别准确性。总的来说,基于聚焦库的 C NMR 光谱和结构的跨模态检索(CFLS)实现了高精度,并为化合物鉴定提供了比传统库更准确的候选结构。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验