• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于质谱的代谢组学中用于化合物鉴定的二元相似性度量的比较分析

Comparative Analysis of Binary Similarity Measures for Compound Identification in MassSpectrometry-Based Metabolomics.

作者信息

Kim Seongho, Kato Ikuko, Zhang Xiang

机构信息

Biostatistics and Bioinformatics Core, Karmanos Cancer Institute, Department of Oncology, School of Medicine, Wayne State University, Detroit, MI 48201, USA.

Department of Oncology and Pathology, School of Medicine, Wayne State University, Detroit, MI 48201, USA.

出版信息

Metabolites. 2022 Jul 26;12(8):694. doi: 10.3390/metabo12080694.

DOI:10.3390/metabo12080694
PMID:35893261
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9394311/
Abstract

Compound identification is a critical step in untargeted metabolomics. Its most important procedure is to calculate the similarity between experimental mass spectra and either predicted mass spectra or mass spectra in a mass spectral library. Unlike the continuous similarity measures, there is no study to assess the performance of binary similarity measures in compound identification, even though the well-known Jaccard similarity measure has been widely used without proper evaluation. The objective of this study is thus to evaluate the performance of binary similarity measures for compound identification in untargeted metabolomics. Fifteen binary similarity measures, including the well-known Jaccard, Dice, Sokal-Sneath, Cosine, and Simpson measures, were selected to assess their performance in compound identification. using both electron ionization (EI) and electrospray ionization (ESI) mass spectra. Our theoretical evaluations show that the accuracy of the compound identification was exactly the same between the Jaccard, Dice, 3W-Jaccard, Sokal-Sneath, and Kulczynski measures, between the Cosine and Hellinger measures, and between the McConnaughey and Driver-Kroeber measures, which were practically confirmed using mass spectra libraries. From the mass spectrum-based evaluation, we observed that the best performing similarity measures were the McConnaughey and Driver-Kroeber measures for EI mass spectra and the Cosine and Hellinger measures for ESI mass spectra. The most robust similarity measure was the Fager-McGowan measure, the second-best performing similarity measure in both EI and ESI mass spectra.

摘要

化合物鉴定是非靶向代谢组学中的关键步骤。其最重要的过程是计算实验质谱与预测质谱或质谱库中的质谱之间的相似度。与连续相似度度量不同,尽管著名的杰卡德相似度度量在未经过适当评估的情况下被广泛使用,但目前尚无研究评估二元相似度度量在化合物鉴定中的性能。因此,本研究的目的是评估二元相似度度量在非靶向代谢组学中化合物鉴定的性能。我们选择了十五种二元相似度度量,包括著名的杰卡德、迪西、索卡尔 - 斯内斯、余弦和辛普森度量,以评估它们在化合物鉴定中的性能,使用电子电离(EI)和电喷雾电离(ESI)质谱。我们的理论评估表明,在杰卡德、迪西、3W - 杰卡德、索卡尔 - 斯内斯和库尔钦斯基度量之间,在余弦和赫林格度量之间,以及在麦康纳希和德赖弗 - 克罗伯度量之间,化合物鉴定的准确性完全相同,这在使用质谱库时得到了实际证实。从基于质谱的评估中,我们观察到,对于EI质谱,性能最佳的相似度度量是麦康纳希和德赖弗 - 克罗伯度量;对于ESI质谱,性能最佳的相似度度量是余弦和赫林格度量。最稳健的相似度度量是法格 - 麦高恩度量,它在EI和ESI质谱中都是性能第二好的相似度度量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/550f/9394311/130f79ba6550/metabolites-12-00694-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/550f/9394311/db3c8b09719e/metabolites-12-00694-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/550f/9394311/ab9082762db4/metabolites-12-00694-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/550f/9394311/82dca6c75236/metabolites-12-00694-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/550f/9394311/2743b79141fc/metabolites-12-00694-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/550f/9394311/b0570a1d12ad/metabolites-12-00694-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/550f/9394311/130f79ba6550/metabolites-12-00694-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/550f/9394311/db3c8b09719e/metabolites-12-00694-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/550f/9394311/ab9082762db4/metabolites-12-00694-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/550f/9394311/82dca6c75236/metabolites-12-00694-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/550f/9394311/2743b79141fc/metabolites-12-00694-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/550f/9394311/b0570a1d12ad/metabolites-12-00694-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/550f/9394311/130f79ba6550/metabolites-12-00694-g006.jpg

相似文献

1
Comparative Analysis of Binary Similarity Measures for Compound Identification in MassSpectrometry-Based Metabolomics.基于质谱的代谢组学中用于化合物鉴定的二元相似性度量的比较分析
Metabolites. 2022 Jul 26;12(8):694. doi: 10.3390/metabo12080694.
2
Evaluating the Accuracy of the QCEIMS Approach for Computational Prediction of Electron Ionization Mass Spectra of Purines and Pyrimidines.评估QCEIMS方法对嘌呤和嘧啶电子电离质谱进行计算预测的准确性。
Metabolites. 2022 Jan 12;12(1):68. doi: 10.3390/metabo12010068.
3
Computational Prediction of Electron Ionization Mass Spectra to Assist in GC/MS Compound Identification.计算预测电子离子化质谱,以辅助 GC/MS 化合物鉴定。
Anal Chem. 2016 Aug 2;88(15):7689-97. doi: 10.1021/acs.analchem.6b01622. Epub 2016 Jul 21.
4
Quantum Chemical Prediction of Electron Ionization Mass Spectra of Trimethylsilylated Metabolites.量子化学预测三甲基硅烷化代谢物的电子电离质谱。
Anal Chem. 2022 Jan 25;94(3):1559-1566. doi: 10.1021/acs.analchem.1c02838. Epub 2022 Jan 10.
5
How enhanced molecular ions in Cold EI improve compound identification by the NIST library.冷电子电离(Cold EI)中增强的分子离子如何通过NIST库改善化合物鉴定。
Rapid Commun Mass Spectrom. 2015 Dec 15;29(23):2287-92. doi: 10.1002/rcm.7392.
6
Comparative analysis of mass spectral matching-based compound identification in gas chromatography-mass spectrometry.基于质谱匹配的气相色谱-质谱联用化合物鉴定的比较分析。
J Chromatogr A. 2013 Jul 12;1298:132-8. doi: 10.1016/j.chroma.2013.05.021. Epub 2013 May 13.
7
Comparative analysis of mass spectral similarity measures on peak alignment for comprehensive two-dimensional gas chromatography mass spectrometry.全面二维气相色谱质谱联用峰对齐的质谱相似性度量的比较分析。
Comput Math Methods Med. 2013;2013:509761. doi: 10.1155/2013/509761. Epub 2013 Sep 16.
8
MassBank: a public repository for sharing mass spectral data for life sciences.MassBank:一个用于共享生命科学领域质谱数据的公共数据库。
J Mass Spectrom. 2010 Jul;45(7):703-14. doi: 10.1002/jms.1777.
9
Comparison of Cosine, Modified Cosine, and Neutral Loss Based Spectrum Alignment For Discovery of Structurally Related Molecules.余弦、修正余弦和中性丢失的谱对齐方法比较在结构相关分子发现中的应用。
J Am Soc Mass Spectrom. 2022 Sep 7;33(9):1733-1744. doi: 10.1021/jasms.2c00153. Epub 2022 Aug 12.
10
Discovery of False Identification Using Similarity Difference in GC-MS based Metabolomics.基于气相色谱-质谱联用代谢组学中相似性差异的错误识别发现
J Chemom. 2015 Feb 1;29(2):80-86. doi: 10.1002/cem.2665.

引用本文的文献

1
Metabolomic profiles impacted by brief mindfulness intervention with contributions to improved health.受短暂正念干预影响的代谢组学特征及其对改善健康的作用。
Sci Rep. 2025 Jul 25;15(1):27022. doi: 10.1038/s41598-025-12067-7.
2
Comparative analysis of continuous similarity measures for compound identification in mass spectrometry-based metabolomics.基于质谱的代谢组学中用于化合物鉴定的连续相似性度量的比较分析
Chemometr Intell Lab Syst. 2025 Aug 15;263. doi: 10.1016/j.chemolab.2025.105417. Epub 2025 May 3.
3
Prioritization of novel anti-infective stilbene derivatives by combining metabolomic data organization and a stringent 3R-infection model in a knowledge graph.

本文引用的文献

1
Spectral entropy outperforms MS/MS dot product similarity for small-molecule compound identification.光谱熵在小分子化合物鉴定方面优于 MS/MS 点积相似度。
Nat Methods. 2021 Dec;18(12):1524-1531. doi: 10.1038/s41592-021-01331-z. Epub 2021 Dec 2.
2
A comparison of 71 binary similarity coefficients: The effect of base rates.71 种二项相似度系数的比较:基础比率的影响。
PLoS One. 2021 Apr 7;16(4):e0247751. doi: 10.1371/journal.pone.0247751. eCollection 2021.
3
Spec2Vec: Improved mass spectral similarity scoring through learning of structural relationships.
通过在知识图谱中结合代谢组学数据组织和严格的3R感染模型对新型抗感染芪衍生物进行优先级排序。
RSC Adv. 2025 Apr 23;15(17):13010-13030. doi: 10.1039/d4ra08421g. eCollection 2025 Apr 22.
4
The Intersection of Metabolomics and Data Science.代谢组学与数据科学的交叉领域
Metabolites. 2023 Aug 4;13(8):915. doi: 10.3390/metabo13080915.
5
MAW: the reproducible Metabolome Annotation Workflow for untargeted tandem mass spectrometry.MAW:用于非靶向串联质谱的可重复代谢组注释工作流程
J Cheminform. 2023 Mar 4;15(1):32. doi: 10.1186/s13321-023-00695-y.
Spec2Vec:通过学习结构关系提高质谱相似性评分。
PLoS Comput Biol. 2021 Feb 16;17(2):e1008724. doi: 10.1371/journal.pcbi.1008724. eCollection 2021 Feb.
4
MetFID: artificial neural network-based compound fingerprint prediction for metabolite annotation.MetFID:基于人工神经网络的化合物指纹预测代谢物注释。
Metabolomics. 2020 Sep 30;16(10):104. doi: 10.1007/s11306-020-01726-7.
5
Recent advances and prospects of computational methods for metabolite identification: a review with emphasis on machine learning approaches.计算方法在代谢物鉴定中的最新进展和展望:重点综述机器学习方法。
Brief Bioinform. 2019 Nov 27;20(6):2028-2043. doi: 10.1093/bib/bby066.
6
Finding an appropriate equation to measure similarity between binary vectors: case studies on Indonesian and Japanese herbal medicines.找到一种合适的方程来衡量二进制向量之间的相似性:印度尼西亚和日本草药的案例研究。
BMC Bioinformatics. 2016 Dec 7;17(1):520. doi: 10.1186/s12859-016-1392-z.
7
Optimization and testing of mass spectral library search algorithms for compound identification.化合物鉴定的质谱文库搜索算法的优化和测试。
J Am Soc Mass Spectrom. 1994 Sep;5(9):859-66. doi: 10.1016/1044-0305(94)87009-8.
8
Comparative analysis of mass spectral matching-based compound identification in gas chromatography-mass spectrometry.基于质谱匹配的气相色谱-质谱联用化合物鉴定的比较分析。
J Chromatogr A. 2013 Jul 12;1298:132-8. doi: 10.1016/j.chroma.2013.05.021. Epub 2013 May 13.
9
MetFusion: integration of compound identification strategies.MetFusion:化合物鉴定策略的整合。
J Mass Spectrom. 2013 Mar;48(3):291-8. doi: 10.1002/jms.3123.
10
Similarity coefficients for binary chemoinformatics data: overview and extended comparison using simulated and real data sets.二值化化学生物信息数据的相似度系数:综述及使用模拟和真实数据集的扩展比较。
J Chem Inf Model. 2012 Nov 26;52(11):2884-901. doi: 10.1021/ci300261r. Epub 2012 Nov 7.