Suppr超能文献

高可信度的代谢物结构注释,这些代谢物在光谱库中不存在。

High-confidence structural annotation of metabolites absent from spectral libraries.

机构信息

Chair for Bioinformatics, Faculty of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena, Germany.

International Max Planck Research School 'Exploration of Ecological Interactions with Molecular and Chemical Techniques', Max Planck Institute for Chemical Ecology, Jena, Germany.

出版信息

Nat Biotechnol. 2022 Mar;40(3):411-421. doi: 10.1038/s41587-021-01045-9. Epub 2021 Oct 14.

Abstract

Untargeted metabolomics experiments rely on spectral libraries for structure annotation, but, typically, only a small fraction of spectra can be matched. Previous in silico methods search in structure databases but cannot distinguish between correct and incorrect annotations. Here we introduce the COSMIC workflow that combines in silico structure database generation and annotation with a confidence score consisting of kernel density P value estimation and a support vector machine with enforced directionality of features. On diverse datasets, COSMIC annotates a substantial number of hits at low false discovery rates and outperforms spectral library search. To demonstrate that COSMIC can annotate structures never reported before, we annotated 12 natural bile acids. The annotation of nine structures was confirmed by manual evaluation and two structures using synthetic standards. In human samples, we annotated and manually validated 315 molecular structures currently absent from the Human Metabolome Database. Application of COSMIC to data from 17,400 metabolomics experiments led to 1,715 high-confidence structural annotations that were absent from spectral libraries.

摘要

非靶向代谢组学实验依赖于光谱库进行结构注释,但通常只有一小部分光谱可以匹配。以前的基于计算机的方法在结构数据库中进行搜索,但无法区分正确和错误的注释。在这里,我们介绍了 COSMIC 工作流程,该流程将基于计算机的结构数据库生成和注释与置信度评分相结合,置信度评分由核密度 P 值估计和支持向量机以及特征的强制性方向性组成。在不同的数据集上,COSMIC 在低误报率下注释了大量命中,并优于光谱库搜索。为了证明 COSMIC 可以注释以前从未报道过的结构,我们注释了 12 种天然胆汁酸。通过手动评估和使用合成标准对其中 9 个结构进行了注释。在人类样本中,我们注释并手动验证了当前不存在于人类代谢组数据库中的 315 种分子结构。将 COSMIC 应用于来自 17400 项代谢组学实验的数据,得到了 1715 个来自光谱库的高置信度结构注释。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cb8d/8926923/66d6c7638aff/41587_2021_1045_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验