Suppr超能文献

结合实验与计算红外光谱和质谱用于高通量非靶向化学结构鉴定

Combining Experimental with Computational Infrared and Mass Spectra for High-Throughput Nontargeted Chemical Structure Identification.

作者信息

Karunaratne Erandika, Hill Dennis W, Dührkop Kai, Böcker Sebastian, Grant David F

机构信息

Department of Pharmaceutical Sciences, University of Connecticut, Storrs, Connecticut 06269, United States.

Chair for Bioinformatics, Faculty of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena 07743, Germany.

出版信息

Anal Chem. 2023 Aug 15;95(32):11901-11907. doi: 10.1021/acs.analchem.3c00937. Epub 2023 Aug 4.

Abstract

The inability to identify the structures of most metabolites detected in environmental or biological samples limits the utility of nontargeted metabolomics. The most widely used analytical approaches combine mass spectrometry and machine learning methods to rank candidate structures contained in large chemical databases. Given the large chemical space typically searched, the use of additional orthogonal data may improve the identification rates and reliability. Here, we present results of combining experimental and computational mass and IR spectral data for high-throughput nontargeted chemical structure identification. Experimental MS/MS and gas-phase IR data for 148 test compounds were obtained from NIST. Candidate structures for each of the test compounds were obtained from PubChem (mean = 4444 candidate structures per test compound). Our workflow used CSI:FingerID to initially score and rank the candidate structures. The top 1000 ranked candidates were subsequently used for IR spectra prediction, scoring, and ranking using density functional theory (DFT-IR). Final ranking of the candidates was based on a composite score calculated as the average of the CSI:FingerID and DFT-IR rankings. This approach resulted in the correct identification of 88 of the 148 test compounds (59%). 129 of the 148 test compounds (87%) were ranked within the top 20 candidates. These identification rates are the highest yet reported when candidate structures are used from PubChem. Combining experimental and computational MS/MS and IR spectral data is a potentially powerful option for prioritizing candidates for final structure verification.

摘要

无法识别环境或生物样本中检测到的大多数代谢物的结构,限制了非靶向代谢组学的实用性。最广泛使用的分析方法是将质谱和机器学习方法相结合,对大型化学数据库中包含的候选结构进行排序。鉴于通常搜索的化学空间较大,使用额外的正交数据可能会提高识别率和可靠性。在此,我们展示了结合实验和计算的质谱与红外光谱数据用于高通量非靶向化学结构鉴定的结果。148种测试化合物的实验性二级质谱(MS/MS)和气相红外数据取自美国国家标准与技术研究院(NIST)。每种测试化合物的候选结构取自化学物质数据库(PubChem)(每种测试化合物平均有4444个候选结构)。我们的工作流程使用CSI:FingerID对候选结构进行初步评分和排序。随后,使用密度泛函理论(DFT-IR)对排名前1000的候选结构进行红外光谱预测、评分和排序。候选结构的最终排名基于CSI:FingerID和DFT-IR排名的平均值计算得出的综合得分。这种方法正确鉴定出了148种测试化合物中的88种(59%)。148种测试化合物中的129种(87%)在排名前20的候选结构中。当使用来自PubChem的候选结构时,这些识别率是迄今报道的最高水平。结合实验和计算的MS/MS以及红外光谱数据,对于为最终结构验证确定候选结构的优先级而言,是一个潜在的有力选择。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验