Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Leninsky Prospekt 47, Moscow 119991, Russia.
J Am Chem Soc. 2022 Aug 17;144(32):14590-14606. doi: 10.1021/jacs.2c03631. Epub 2022 Aug 8.
Mass spectrometry (MS) is a convenient, highly sensitive, and reliable method for the analysis of complex mixtures, which is vital for materials science, life sciences fields such as metabolomics and proteomics, and mechanistic research in chemistry. Although it is one of the most powerful methods for individual compound detection, complete signal assignment in complex mixtures is still a great challenge. The unconstrained formula-generating algorithm, covering the entire spectra and revealing components, is a "dream tool" for researchers. We present the framework for efficient MS data interpretation, describing a novel approach for detailed analysis based on deisotoping performed by gradient-boosted decision trees and a neural network that generates molecular formulas from the fine isotopic structure, approaching the long-standing inverse spectral problem. The methods were successfully tested on three examples: fragment ion analysis in protein sequencing for proteomics, analysis of the natural samples for life sciences, and study of the cross-coupling catalytic system for chemistry.
质谱(MS)是一种用于分析复杂混合物的便捷、高灵敏度和可靠的方法,对材料科学、代谢组学和蛋白质组学等生命科学领域以及化学中的机制研究至关重要。尽管它是用于检测单个化合物的最强大的方法之一,但在复杂混合物中进行完整的信号分配仍然是一个巨大的挑战。无约束公式生成算法涵盖整个光谱并揭示成分,是研究人员的“梦想工具”。我们提出了一种有效的 MS 数据解释框架,描述了一种基于梯度提升决策树和神经网络的详细分析新方法,该方法通过对精细同位素结构进行去同位素处理来生成分子公式,从而解决长期存在的逆光谱问题。该方法在三个示例中得到了成功的测试:蛋白质组学中的蛋白质测序中碎片离子分析、生命科学中的天然样品分析以及化学中的交叉偶联催化体系研究。