• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用输入输出核回归进行快速代谢物鉴定。

Fast metabolite identification with Input Output Kernel Regression.

作者信息

Brouard Céline, Shen Huibin, Dührkop Kai, d'Alché-Buc Florence, Böcker Sebastian, Rousu Juho

机构信息

Department of Computer Science, Aalto University, Espoo, Finland Helsinki Institute for Information Technology, Espoo, Finland.

Chair for Bioinformatics, Friedrich-Schiller University, Jena, Germany.

出版信息

Bioinformatics. 2016 Jun 15;32(12):i28-i36. doi: 10.1093/bioinformatics/btw246.

DOI:10.1093/bioinformatics/btw246
PMID:27307628
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4908330/
Abstract

MOTIVATION

An important problematic of metabolomics is to identify metabolites using tandem mass spectrometry data. Machine learning methods have been proposed recently to solve this problem by predicting molecular fingerprint vectors and matching these fingerprints against existing molecular structure databases. In this work we propose to address the metabolite identification problem using a structured output prediction approach. This type of approach is not limited to vector output space and can handle structured output space such as the molecule space.

RESULTS

We use the Input Output Kernel Regression method to learn the mapping between tandem mass spectra and molecular structures. The principle of this method is to encode the similarities in the input (spectra) space and the similarities in the output (molecule) space using two kernel functions. This method approximates the spectra-molecule mapping in two phases. The first phase corresponds to a regression problem from the input space to the feature space associated to the output kernel. The second phase is a preimage problem, consisting in mapping back the predicted output feature vectors to the molecule space. We show that our approach achieves state-of-the-art accuracy in metabolite identification. Moreover, our method has the advantage of decreasing the running times for the training step and the test step by several orders of magnitude over the preceding methods.

CONTACT

celine.brouard@aalto.fi

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

代谢组学的一个重要问题是利用串联质谱数据识别代谢物。最近有人提出机器学习方法来解决这个问题,即预测分子指纹向量并将这些指纹与现有的分子结构数据库进行匹配。在这项工作中,我们建议使用结构化输出预测方法来解决代谢物识别问题。这种方法不限于向量输出空间,还可以处理结构化输出空间,如分子空间。

结果

我们使用输入输出核回归方法来学习串联质谱与分子结构之间的映射。该方法的原理是使用两个核函数对输入(光谱)空间中的相似性和输出(分子)空间中的相似性进行编码。此方法分两个阶段近似光谱-分子映射。第一阶段对应于从输入空间到与输出核相关的特征空间的回归问题。第二阶段是一个原像问题,包括将预测的输出特征向量映射回分子空间。我们表明,我们的方法在代谢物识别方面达到了当前的先进精度。此外,与之前的方法相比,我们的方法具有将训练步骤和测试步骤的运行时间减少几个数量级的优势。

联系方式

celine.brouard@aalto.fi

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f69/4908330/1265fa806f76/btw246f7p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f69/4908330/ed91e883bd72/btw246f1p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f69/4908330/e9be1a339c91/btw246f2p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f69/4908330/cbf387b184a1/btw246f3p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f69/4908330/b3a4324e5879/btw246f4p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f69/4908330/47be4aef3fd2/btw246f5p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f69/4908330/1ee9cd8c6317/btw246f6p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f69/4908330/1265fa806f76/btw246f7p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f69/4908330/ed91e883bd72/btw246f1p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f69/4908330/e9be1a339c91/btw246f2p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f69/4908330/cbf387b184a1/btw246f3p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f69/4908330/b3a4324e5879/btw246f4p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f69/4908330/47be4aef3fd2/btw246f5p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f69/4908330/1ee9cd8c6317/btw246f6p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f69/4908330/1265fa806f76/btw246f7p.jpg

相似文献

1
Fast metabolite identification with Input Output Kernel Regression.使用输入输出核回归进行快速代谢物鉴定。
Bioinformatics. 2016 Jun 15;32(12):i28-i36. doi: 10.1093/bioinformatics/btw246.
2
ADAPTIVE: leArning DAta-dePendenT, concIse molecular VEctors for fast, accurate metabolite identification from tandem mass spectra.ADAPTIVE:从串联质谱中快速、准确识别代谢物的学习数据依赖、简洁的分子向量。
Bioinformatics. 2019 Jul 15;35(14):i164-i172. doi: 10.1093/bioinformatics/btz319.
3
Metabolite identification and molecular fingerprint prediction through machine learning.通过机器学习进行代谢产物鉴定和分子指纹预测。
Bioinformatics. 2012 Sep 15;28(18):2333-41. doi: 10.1093/bioinformatics/bts437. Epub 2012 Jul 18.
4
Metabolite identification through multiple kernel learning on fragmentation trees.基于碎裂树的多核学习进行代谢产物鉴定。
Bioinformatics. 2014 Jun 15;30(12):i157-64. doi: 10.1093/bioinformatics/btu275.
5
Deep kernel learning improves molecular fingerprint prediction from tandem mass spectra.深度学习提高串联质谱分子指纹预测。
Bioinformatics. 2022 Jun 24;38(Suppl 1):i342-i349. doi: 10.1093/bioinformatics/btac260.
6
SIMPLE: Sparse Interaction Model over Peaks of moLEcules for fast, interpretable metabolite identification from tandem mass spectra.SIMPLE:基于分子峰的稀疏相互作用模型,用于从串联质谱中快速、可解释地鉴定代谢物。
Bioinformatics. 2018 Jul 1;34(13):i323-i332. doi: 10.1093/bioinformatics/bty252.
7
Improved Small Molecule Identification through Learning Combinations of Kernel Regression Models.通过学习核回归模型的组合改进小分子识别
Metabolites. 2019 Aug 1;9(8):160. doi: 10.3390/metabo9080160.
8
Identification of metabolites from tandem mass spectra with a machine learning approach utilizing structural features.利用机器学习方法结合结构特征鉴定串联质谱中的代谢物。
Bioinformatics. 2020 Feb 15;36(4):1213-1218. doi: 10.1093/bioinformatics/btz736.
9
Searching molecular structure databases with tandem mass spectra using CSI:FingerID.使用CSI:FingerID通过串联质谱搜索分子结构数据库。
Proc Natl Acad Sci U S A. 2015 Oct 13;112(41):12580-5. doi: 10.1073/pnas.1509788112. Epub 2015 Sep 21.
10
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学:基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍

引用本文的文献

1
mineMS2: annotation of spectral libraries with exact fragmentation patterns.mineMS2:使用精确的碎片模式对光谱库进行注释。
J Cheminform. 2025 Jul 24;17(1):111. doi: 10.1186/s13321-025-01051-y.
2
Manifold fitting reveals metabolomic heterogeneity and disease associations in UK Biobank populations.流形拟合揭示了英国生物银行人群中的代谢组学异质性和疾病关联。
Proc Natl Acad Sci U S A. 2025 Jun 3;122(22):e2500001122. doi: 10.1073/pnas.2500001122. Epub 2025 May 28.
3
An approach of molecular-fingerprint prediction implementing a GAT.

本文引用的文献

1
Illuminating the dark matter in metabolomics.揭示代谢组学中的暗物质
Proc Natl Acad Sci U S A. 2015 Oct 13;112(41):12549-50. doi: 10.1073/pnas.1516878112. Epub 2015 Oct 1.
2
Searching molecular structure databases with tandem mass spectra using CSI:FingerID.使用CSI:FingerID通过串联质谱搜索分子结构数据库。
Proc Natl Acad Sci U S A. 2015 Oct 13;112(41):12580-5. doi: 10.1073/pnas.1509788112. Epub 2015 Sep 21.
3
MIDAS: a database-searching algorithm for metabolite identification in metabolomics.MIDAS:一种用于代谢组学中代谢物鉴定的数据库搜索算法。
一种实现图注意力网络(GAT)的分子指纹预测方法。
RSC Adv. 2025 Apr 22;15(16):12757-12764. doi: 10.1039/d5ra00973a. eCollection 2025 Apr 16.
4
Scaling up drug combination surface prediction.扩大药物组合表面预测。
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf099.
5
TransExION: a transformer based explainable similarity metric for comparing IONS in tandem mass spectrometry.TransExION:一种基于变压器的可解释相似性度量,用于比较串联质谱中的离子。
J Cheminform. 2024 May 28;16(1):61. doi: 10.1186/s13321-024-00858-5.
6
Data-Driven Compound Identification in Atmospheric Mass Spectrometry.大气质谱中数据驱动的化合物识别
Adv Sci (Weinh). 2024 Feb;11(8):e2306235. doi: 10.1002/advs.202306235. Epub 2023 Dec 14.
7
Strategies for structure elucidation of small molecules based on LC-MS/MS data from complex biological samples.基于复杂生物样品的液相色谱-串联质谱数据解析小分子结构的策略。
Comput Struct Biotechnol J. 2022 Sep 7;20:5085-5097. doi: 10.1016/j.csbj.2022.09.004. eCollection 2022.
8
Machine learning for identification of silylated derivatives from mass spectra.用于从质谱图中识别硅烷化衍生物的机器学习
J Cheminform. 2022 Sep 15;14(1):62. doi: 10.1186/s13321-022-00636-1.
9
Deep kernel learning improves molecular fingerprint prediction from tandem mass spectra.深度学习提高串联质谱分子指纹预测。
Bioinformatics. 2022 Jun 24;38(Suppl 1):i342-i349. doi: 10.1093/bioinformatics/btac260.
10
Metabolomics and genomics in natural products research: complementary tools for targeting new chemical entities.代谢组学和基因组学在天然产物研究中的应用:靶向新化学实体的互补工具。
Nat Prod Rep. 2021 Nov 17;38(11):2041-2065. doi: 10.1039/d1np00036e.
Anal Chem. 2014 Oct 7;86(19):9496-503. doi: 10.1021/ac5014783. Epub 2014 Sep 11.
4
Metabolite Identification through Machine Learning- Tackling CASMI Challenge Using FingerID.通过机器学习进行代谢物鉴定——使用FingerID应对CASMI挑战
Metabolites. 2013 Jun 6;3(2):484-505. doi: 10.3390/metabo3020484.
5
Metabolite identification through multiple kernel learning on fragmentation trees.基于碎裂树的多核学习进行代谢产物鉴定。
Bioinformatics. 2014 Jun 15;30(12):i157-64. doi: 10.1093/bioinformatics/btu275.
6
CFM-ID: a web server for annotation, spectrum prediction and metabolite identification from tandem mass spectra.CFM-ID:一个用于串联质谱注释、谱预测和代谢物鉴定的网络服务器。
Nucleic Acids Res. 2014 Jul;42(Web Server issue):W94-9. doi: 10.1093/nar/gku436. Epub 2014 Jun 3.
7
Automatic chemical structure annotation of an LC-MS(n) based metabolic profile from green tea.基于 LC-MS(n) 的绿茶代谢谱的自动化学结构注释。
Anal Chem. 2013 Jun 18;85(12):6033-40. doi: 10.1021/ac400861a. Epub 2013 May 31.
8
The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013.《ChEBI 参考数据库和生物学相关化学本体:2013 年的增强》
Nucleic Acids Res. 2013 Jan;41(Database issue):D456-63. doi: 10.1093/nar/gks1146. Epub 2012 Nov 24.
9
Metabolite identification and molecular fingerprint prediction through machine learning.通过机器学习进行代谢产物鉴定和分子指纹预测。
Bioinformatics. 2012 Sep 15;28(18):2333-41. doi: 10.1093/bioinformatics/bts437. Epub 2012 Jul 18.
10
MassBank: a public repository for sharing mass spectral data for life sciences.MassBank:一个用于共享生命科学领域质谱数据的公共数据库。
J Mass Spectrom. 2010 Jul;45(7):703-14. doi: 10.1002/jms.1777.