Guo Zixuan, Fan Yingjie, Yu Chuanxiu, Lu Hongmei, Zhang Zhimin
College of Chemistry and Chemical Engineering, Central South University, Hunan, Changsha 410083, China.
Anal Chem. 2024 Apr 16;96(15):5878-5886. doi: 10.1021/acs.analchem.3c05772. Epub 2024 Apr 1.
Gas chromatography-mass spectrometry (GC-MS) is one of the most important instruments for analyzing volatile organic compounds. However, the complexity of real samples and the limitations of chromatographic separation capabilities lead to coeluting compounds without ideal separation. In this study, a Transformer-based automatic resolution method (GCMSFormer) is proposed to resolve mass spectra from GC-MS peaks in an end-to-end manner, predicting the mass spectra of components directly from the raw overlapping peaks data. Furthermore, orthogonal projection resolution (OPR) was integrated into GCMSFormer to resolve minor components. The GCMSFormer model was trained, validated, and tested using 100,000 augmented data. It achieves 99.88% of the bilingual evaluation understudy (BLEU) value on the test set, significantly higher than the 97.68% BLEU value of the baseline sequence-to-sequence model long short-term memory (LSTM). GCMSFormer was also compared with two nondeep learning resolution tools (MZmine and AMDIS) and two deep learning resolution tools (PARAFAC2 with DL and MSHub/GNPS) on a real plant essential oil GC-MS data set. Their resolution results were compared on evaluation metrics, including the number of compounds resolved, mass spectral match score, correlation coefficient, explained variance, and resolution speed. The results demonstrate that GCMSFormer has better resolution performance, higher automation, and faster resolution speed. In summary, GCMSFormer is an end-to-end, fast, fully automatic, and accurate method for analyzing GC-MS data of complex samples.
气相色谱-质谱联用仪(GC-MS)是分析挥发性有机化合物最重要的仪器之一。然而,实际样品的复杂性和色谱分离能力的局限性导致共洗脱化合物无法得到理想分离。在本研究中,提出了一种基于Transformer的自动解析方法(GCMSFormer),以端到端的方式解析GC-MS峰的质谱,直接从原始重叠峰数据预测组分的质谱。此外,将正交投影解析(OPR)集成到GCMSFormer中以解析次要组分。使用100,000个增强数据对GCMSFormer模型进行训练、验证和测试。在测试集上,它实现了99.88%的双语评估替代(BLEU)值,显著高于基线序列到序列模型长短期记忆(LSTM)的97.68%的BLEU值。还在真实植物精油GC-MS数据集上,将GCMSFormer与两种非深度学习解析工具(MZmine和AMDIS)以及两种深度学习解析工具(带DL的PARAFAC2和MSHub/GNPS)进行了比较。在包括解析出的化合物数量、质谱匹配分数、相关系数、解释方差和解析速度等评估指标上比较了它们的解析结果。结果表明,GCMSFormer具有更好的解析性能、更高的自动化程度和更快的解析速度。总之,GCMSFormer是一种用于分析复杂样品GC-MS数据的端到端、快速、全自动且准确的方法。