McGlynn Deborah F, Yee Lindsay D, Garraffo H Martin, Geer Lewis Y, Mak Tytus D, Mirokhin Yuri A, Tchekhovskoi Dmitrii V, Jen Coty N, Goldstein Allen H, Kearsley Anthony J, Stein Stephen E
Applied and Computational Mathematics Division, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, United States.
Department of Environmental Science, Policy, & Management, University of California at Berkeley, Berkeley California 94720, United States.
J Am Soc Mass Spectrom. 2025 Feb 5;36(2):389-399. doi: 10.1021/jasms.4c00451. Epub 2025 Jan 13.
While gas chromatography mass spectrometry (GC-MS) has long been used to identify compounds in complex mixtures, this process is often subjective and time-consuming and leaves a large fraction of seemingly good-quality spectra unidentified. In this work, we describe a set of new mass spectral library-based methods to assist compound identification in complex mixtures. These methods employ mass spectral uniqueness and compound ubiquity of library entries alongside noise reduction and automated comparison of retention indices to library compounds. As a test data set, we used a publicly available electron ionization mass spectrometry data set consisting of 4833 spectra of particulate organic compounds emitted by combustion of wildland fuels. In the present work, spectra in this data set were first identified using the NIST 2023 EI-MS Library and associated batch process identification software (NIST MS PepSearch) using retention-index corrected Identity Search scoring. Resulting identifications and related information were then employed to parametrize other factors that correlate with identification. A method for identifying compounds absent from but related to those present in mass spectral libraries using the Hybrid Similarity Search is illustrated. Nevertheless, some 90% of the spectra remain unidentified. Through comparison of unidentified to identified mass spectra in this data set, a new simple measure, namely median relative abundance, was developed for evaluating the likelihood of identification.
虽然气相色谱-质谱联用(GC-MS)长期以来一直用于鉴定复杂混合物中的化合物,但这个过程往往主观且耗时,并且有很大一部分看似质量不错的光谱无法识别。在这项工作中,我们描述了一套基于质谱库的新方法,以协助鉴定复杂混合物中的化合物。这些方法利用库条目的质谱唯一性和化合物普遍性,同时进行降噪以及将保留指数与库化合物进行自动比较。作为测试数据集,我们使用了一个公开可用的电子电离质谱数据集,该数据集由4833个野火燃料燃烧排放的颗粒有机化合物的光谱组成。在本工作中,首先使用NIST 2023 EI-MS库和相关的批量处理识别软件(NIST MS PepSearch),通过保留指数校正的身份搜索评分来识别该数据集中的光谱。然后,利用得到的识别结果和相关信息来参数化与识别相关的其他因素。展示了一种使用混合相似性搜索来识别质谱库中不存在但与存在的化合物相关的化合物的方法。然而,仍有约90%的光谱未被识别。通过比较该数据集中未识别光谱与已识别光谱,开发了一种新的简单度量方法,即中位数相对丰度,用于评估识别的可能性。