Suppr超能文献

高可信度的代谢物结构注释,这些代谢物在光谱库中不存在。

High-confidence structural annotation of metabolites absent from spectral libraries.

机构信息

Chair for Bioinformatics, Faculty of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena, Germany.

International Max Planck Research School 'Exploration of Ecological Interactions with Molecular and Chemical Techniques', Max Planck Institute for Chemical Ecology, Jena, Germany.

出版信息

Nat Biotechnol. 2022 Mar;40(3):411-421. doi: 10.1038/s41587-021-01045-9. Epub 2021 Oct 14.

Abstract

Untargeted metabolomics experiments rely on spectral libraries for structure annotation, but, typically, only a small fraction of spectra can be matched. Previous in silico methods search in structure databases but cannot distinguish between correct and incorrect annotations. Here we introduce the COSMIC workflow that combines in silico structure database generation and annotation with a confidence score consisting of kernel density P value estimation and a support vector machine with enforced directionality of features. On diverse datasets, COSMIC annotates a substantial number of hits at low false discovery rates and outperforms spectral library search. To demonstrate that COSMIC can annotate structures never reported before, we annotated 12 natural bile acids. The annotation of nine structures was confirmed by manual evaluation and two structures using synthetic standards. In human samples, we annotated and manually validated 315 molecular structures currently absent from the Human Metabolome Database. Application of COSMIC to data from 17,400 metabolomics experiments led to 1,715 high-confidence structural annotations that were absent from spectral libraries.

摘要

非靶向代谢组学实验依赖于光谱库进行结构注释,但通常只有一小部分光谱可以匹配。以前的基于计算机的方法在结构数据库中进行搜索,但无法区分正确和错误的注释。在这里,我们介绍了 COSMIC 工作流程,该流程将基于计算机的结构数据库生成和注释与置信度评分相结合,置信度评分由核密度 P 值估计和支持向量机以及特征的强制性方向性组成。在不同的数据集上,COSMIC 在低误报率下注释了大量命中,并优于光谱库搜索。为了证明 COSMIC 可以注释以前从未报道过的结构,我们注释了 12 种天然胆汁酸。通过手动评估和使用合成标准对其中 9 个结构进行了注释。在人类样本中,我们注释并手动验证了当前不存在于人类代谢组数据库中的 315 种分子结构。将 COSMIC 应用于来自 17400 项代谢组学实验的数据,得到了 1715 个来自光谱库的高置信度结构注释。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cb8d/8926923/66d6c7638aff/41587_2021_1045_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验