West Coast Metabolomics Center, UC Davis Genome Center, University of California, Davis, CA, USA.
Olobion, Parc Científic de Barcelona, Barcelona, Spain.
Nat Methods. 2021 Dec;18(12):1524-1531. doi: 10.1038/s41592-021-01331-z. Epub 2021 Dec 2.
Compound identification in small-molecule research, such as untargeted metabolomics or exposome research, relies on matching tandem mass spectrometry (MS/MS) spectra against experimental or in silico mass spectral libraries. Most software programs use dot product similarity scores. Here we introduce the concept of MS/MS spectral entropy to improve scoring results in MS/MS similarity searches via library matching. Entropy similarity outperformed 42 alternative similarity algorithms, including dot product similarity, when searching 434,287 spectra against the high-quality NIST20 library. Entropy similarity scores proved to be highly robust even when we added different levels of noise ions. When we applied entropy levels to 37,299 experimental spectra of natural products, false discovery rates of less than 10% were observed at entropy similarity score 0.75. Experimental human gut metabolome data were used to confirm that entropy similarity largely improved the accuracy of MS-based annotations in small-molecule research to false discovery rates below 10%, annotated new compounds and provided the basis to automatically flag poor-quality, noisy spectra.
在小分子研究(如非靶向代谢组学或暴露组学研究)中,化合物鉴定依赖于将串联质谱(MS/MS)谱与实验或计算质谱谱库进行匹配。大多数软件程序使用点积相似度得分。在这里,我们引入 MS/MS 光谱熵的概念,通过库匹配来提高 MS/MS 相似度搜索中的评分结果。在对高质量 NIST20 库进行 434,287 次光谱搜索时,熵相似度优于包括点积相似度在内的 42 种替代相似度算法。即使在添加不同水平的噪声离子时,熵相似度得分也被证明具有高度的稳健性。当我们将熵水平应用于 37,299 种天然产物的实验光谱时,在熵相似度得分 0.75 时,假发现率低于 10%。我们使用实验性人类肠道代谢组学数据来证实,熵相似度极大地提高了基于 MS 的小分子研究中注释的准确性,假发现率低于 10%,注释了新的化合物,并为自动标记低质量、噪声光谱提供了基础。