Suppr超能文献

使用高斯混合模型的天然有机物高分辨率质谱噪声滤波算法

Noise Filtering Algorithm Using Gaussian Mixture Models for High-Resolution Mass Spectra of Natural Organic Matter.

作者信息

Potemkin Alexander A, Proskurnin Mikhail A, Volkov Dmitry S

机构信息

Chemistry Department of M.V. Lomonosov Moscow State University, Leninskie Gory, 1-3, GSP-1, Moscow 119991, Russia.

出版信息

Anal Chem. 2024 Apr 9;96(14):5455-5461. doi: 10.1021/acs.analchem.3c05453. Epub 2024 Mar 26.

Abstract

High-resolution mass spectra of natural organic matter (NOM) contain a large number of noise signals. These signals interfere with the correct molecular composition estimation during nontargeted analysis because formula-assignment programs find empirical formulas for such peaks as well. Previously proposed noise filtering methods that utilize the profile of the intensity distribution of mass spectrum peaks rely on a histogram to calculate the intensity threshold value. However, the histogram profile can vary depending on the user settings. In addition, these algorithms are not automated, so they are handled manually. To overcome the mentioned drawbacks, we propose a new algorithm for noise filtering in mass spectra. This filter is based on Gaussian Mixture Models (GMMs), a machine learning method to find the intensity threshold value. The algorithm is completely data-driven and eliminates the need to work with a histogram. It has no customizable parameters and automatically determines the noise level for each individual mass spectrum. The algorithm performance was tested on mass spectra of natural organic matter obtained by averaging a different number of microscans (transients), and the results were compared with other noise filters proposed in the literature. Finally, the effect of this noise filtering approach on the fraction of peaks with assigned formulas was investigated. It was shown that there is always an increase in the identification rate, but the magnitude of the effect changes with the number of microscans averaged. The increase can be as high as 15%.

摘要

天然有机物(NOM)的高分辨率质谱包含大量噪声信号。在非靶向分析过程中,这些信号会干扰正确的分子组成估计,因为分子式分配程序也会为这类峰找到经验分子式。先前提出的利用质谱峰强度分布轮廓的噪声过滤方法依赖直方图来计算强度阈值。然而,直方图轮廓可能会因用户设置而有所不同。此外,这些算法不是自动化的,因此需要手动处理。为了克服上述缺点,我们提出了一种用于质谱噪声过滤的新算法。该滤波器基于高斯混合模型(GMM),这是一种用于找到强度阈值的机器学习方法。该算法完全由数据驱动,无需使用直方图。它没有可定制的参数,并能自动为每个单独的质谱确定噪声水平。我们在通过平均不同数量的微扫描(瞬态)获得的天然有机物质谱上测试了该算法的性能,并将结果与文献中提出的其他噪声滤波器进行了比较。最后,研究了这种噪声过滤方法对已分配分子式的峰比例的影响。结果表明,识别率总是会提高,但影响的程度会随着平均微扫描次数的变化而变化。提高幅度可达15%。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验