• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

疯帽匠在PubChem中搜索时能正确注释98%的小分子串联质谱图。

MAD HATTER Correctly Annotates 98% of Small Molecule Tandem Mass Spectra Searching in PubChem.

作者信息

Hoffmann Martin A, Kretschmer Fleming, Ludwig Marcus, Böcker Sebastian

机构信息

Chair for Bioinformatics, Institute for Computer Science, Friedrich-Schiller-University Jena, 07743 Jena, Germany.

出版信息

Metabolites. 2023 Feb 21;13(3):314. doi: 10.3390/metabo13030314.

DOI:10.3390/metabo13030314
PMID:36984753
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10053663/
Abstract

Metabolites provide a direct functional signature of cellular state. Untargeted metabolomics usually relies on mass spectrometry, a technology capable of detecting thousands of compounds in a biological sample. Metabolite annotation is executed using tandem mass spectrometry. Spectral library search is far from comprehensive, and numerous compounds remain unannotated. So-called in silico methods allow us to overcome the restrictions of spectral libraries, by searching in much larger molecular structure databases. Yet, after more than a decade of method development, in silico methods still do not reach the correct annotation rates that users would wish for. Here, we present a novel computational method called Mad Hatter for this task. Mad Hatter combines CSI:FingerID results with information from the searched structure database via a metascore. Compound information includes the melting point, and the number of words in the compound description starting with the letter 'u'. We then show that Mad Hatter reaches a stunning 97.6% correct annotations when searching PubChem, one of the largest and most comprehensive molecular structure databases. Unfortunately, Mad Hatter is not a real method. Rather, we developed Mad Hatter solely for the purpose of demonstrating common issues in computational method development and evaluation. We explain what evaluation glitches were necessary for Mad Hatter to reach this annotation level, what is wrong with similar metascores in general, and why metascores may screw up not only method evaluations but also the analysis of biological experiments. This paper may serve as an example of problems in the development and evaluation of machine learning models for metabolite annotation.

摘要

代谢物提供了细胞状态的直接功能特征。非靶向代谢组学通常依赖于质谱技术,该技术能够检测生物样品中的数千种化合物。代谢物注释通过串联质谱进行。光谱库搜索远非全面,许多化合物仍未得到注释。所谓的计算机模拟方法使我们能够通过在更大的分子结构数据库中搜索来克服光谱库的限制。然而,经过十多年的方法开发,计算机模拟方法仍未达到用户期望的正确注释率。在此,我们提出了一种名为“疯帽匠”的新型计算方法来完成这项任务。“疯帽匠”通过一个元分数将CSI:FingerID结果与来自搜索到的结构数据库的信息相结合。化合物信息包括熔点以及化合物描述中以字母“u”开头的单词数量。然后我们表明,在搜索最大且最全面的分子结构数据库之一的PubChem时,“疯帽匠”达到了惊人的97.6%的正确注释率。不幸的是,“疯帽匠”并不是一种真正的方法。相反,我们开发“疯帽匠”仅仅是为了展示计算方法开发和评估中的常见问题。我们解释了“疯帽匠”要达到这个注释水平需要哪些评估漏洞,一般情况下类似的元分数存在哪些问题,以及为什么元分数不仅可能搞砸方法评估,还可能搞砸生物实验的分析。本文可作为代谢物注释机器学习模型开发和评估中问题的一个示例。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cbcf/10053663/964ae8881c0e/metabolites-13-00314-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cbcf/10053663/964ae8881c0e/metabolites-13-00314-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cbcf/10053663/964ae8881c0e/metabolites-13-00314-g001.jpg

相似文献

1
MAD HATTER Correctly Annotates 98% of Small Molecule Tandem Mass Spectra Searching in PubChem.疯帽匠在PubChem中搜索时能正确注释98%的小分子串联质谱图。
Metabolites. 2023 Feb 21;13(3):314. doi: 10.3390/metabo13030314.
2
Searching molecular structure databases with tandem mass spectra using CSI:FingerID.使用CSI:FingerID通过串联质谱搜索分子结构数据库。
Proc Natl Acad Sci U S A. 2015 Oct 13;112(41):12580-5. doi: 10.1073/pnas.1509788112. Epub 2015 Sep 21.
3
[A novel method for efficient screening and annotation of important pathway-associated metabolites based on the modified metabolome and probe molecules].一种基于改良代谢组和探针分子的重要通路相关代谢物高效筛选与注释新方法
Se Pu. 2022 Sep;40(9):788-796. doi: 10.3724/SP.J.1123.2022.03025.
4
Searching molecular structure databases using tandem MS data: are we there yet?使用串联质谱数据搜索分子结构数据库:我们做到了吗?
Curr Opin Chem Biol. 2017 Feb;36:1-6. doi: 10.1016/j.cbpa.2016.12.010. Epub 2016 Dec 22.
5
High-confidence structural annotation of metabolites absent from spectral libraries.高可信度的代谢物结构注释,这些代谢物在光谱库中不存在。
Nat Biotechnol. 2022 Mar;40(3):411-421. doi: 10.1038/s41587-021-01045-9. Epub 2021 Oct 14.
6
Bayesian networks for mass spectrometric metabolite identification via molecular fingerprints.基于分子指纹的代谢物质谱识别的贝叶斯网络。
Bioinformatics. 2018 Jul 1;34(13):i333-i340. doi: 10.1093/bioinformatics/bty245.
7
MetFID: artificial neural network-based compound fingerprint prediction for metabolite annotation.MetFID:基于人工神经网络的化合物指纹预测代谢物注释。
Metabolomics. 2020 Sep 30;16(10):104. doi: 10.1007/s11306-020-01726-7.
8
MINEs: open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics.MINEs:用于非靶向代谢组学的计算预测酶多底物催化产物的开放获取数据库。
J Cheminform. 2015 Aug 28;7:44. doi: 10.1186/s13321-015-0087-1. eCollection 2015.
9
MetFusion: integration of compound identification strategies.MetFusion:化合物鉴定策略的整合。
J Mass Spectrom. 2013 Mar;48(3):291-8. doi: 10.1002/jms.3123.
10
MAW: the reproducible Metabolome Annotation Workflow for untargeted tandem mass spectrometry.MAW:用于非靶向串联质谱的可重复代谢组注释工作流程
J Cheminform. 2023 Mar 4;15(1):32. doi: 10.1186/s13321-023-00695-y.

引用本文的文献

1
Non-Targeted Metabolomic Analysis of (L.) Heynh: Metabolic Adaptive Responses to Stress Caused by N Starvation.对(L.)Heynh的非靶向代谢组学分析:对氮饥饿引起的胁迫的代谢适应性反应。
Metabolites. 2023 Sep 18;13(9):1021. doi: 10.3390/metabo13091021.

本文引用的文献

1
Leakage and the reproducibility crisis in machine-learning-based science.基于机器学习的科学中的漏洞与可重复性危机。
Patterns (N Y). 2023 Aug 4;4(9):100804. doi: 10.1016/j.patter.2023.100804. eCollection 2023 Sep 8.
2
An end-to-end deep learning framework for translating mass spectra to de-novo molecules.一种用于将质谱图翻译为从头合成分子的端到端深度学习框架。
Commun Chem. 2023 Jun 23;6(1):132. doi: 10.1038/s42004-023-00932-3.
3
Insights into performance evaluation of compound-protein interaction prediction methods.复合蛋白相互作用预测方法性能评估的见解
Bioinformatics. 2022 Sep 16;38(Suppl_2):ii75-ii81. doi: 10.1093/bioinformatics/btac496.
4
How (Not) to Generate a Highly Predictive Biomarker Panel Using Machine Learning.如何(不)使用机器学习生成高度可预测的生物标志物面板。
J Proteome Res. 2022 Sep 2;21(9):2071-2074. doi: 10.1021/acs.jproteome.2c00117. Epub 2022 Aug 25.
5
AI/ML-driven advances in untargeted metabolomics and exposomics for biomedical applications.人工智能/机器学习驱动的非靶向代谢组学和暴露组学在生物医学应用中的进展。
Cell Rep Phys Sci. 2022 Jul 20;3(7). doi: 10.1016/j.xcrp.2022.100978.
6
MSNovelist: de novo structure generation from mass spectra.MSNovelist:从头开始从质谱生成结构。
Nat Methods. 2022 Jul;19(7):865-870. doi: 10.1038/s41592-022-01486-3. Epub 2022 May 30.
7
Interpretation of the DOME Recommendations for Machine Learning in Proteomics and Metabolomics.对 DOME 推荐用于蛋白质组学和代谢组学中的机器学习的解读。
J Proteome Res. 2022 Apr 1;21(4):1204-1207. doi: 10.1021/acs.jproteome.1c00900. Epub 2022 Feb 4.
8
MassGenie: A Transformer-Based Deep Learning Method for Identifying Small Molecules from Their Mass Spectra.MassGenie:一种基于 Transformer 的深度学习方法,用于从其质谱中识别小分子。
Biomolecules. 2021 Nov 30;11(12):1793. doi: 10.3390/biom11121793.
9
High-confidence structural annotation of metabolites absent from spectral libraries.高可信度的代谢物结构注释,这些代谢物在光谱库中不存在。
Nat Biotechnol. 2022 Mar;40(3):411-421. doi: 10.1038/s41587-021-01045-9. Epub 2021 Oct 14.
10
DOME: recommendations for supervised machine learning validation in biology.DOME:生物学中监督式机器学习验证的建议
Nat Methods. 2021 Oct;18(10):1122-1127. doi: 10.1038/s41592-021-01205-4.