Guillevic Myriam, Guillevic Aurore, Vollmer Martin K, Schlauri Paul, Hill Matthias, Emmenegger Lukas, Reimann Stefan
Laboratory for Air Pollution /Environmental Technology, Empa, Swiss Federal Laboratories for Materials Science and Technology, Ueberlandstrasse 129, 8600, Dübendorf, Switzerland.
Université de Lorraine, CNRS, Inria, LORIA, 54000, Nancy, France.
J Cheminform. 2021 Oct 4;13(1):78. doi: 10.1186/s13321-021-00544-w.
Non-target screening consists in searching a sample for all present substances, suspected or unknown, with very little prior knowledge about the sample. This approach has been introduced more than a decade ago in the field of water analysis, together with dedicated compound identification tools, but is still very scarce for indoor and atmospheric trace gas measurements, despite the clear need for a better understanding of the atmospheric trace gas composition. For a systematic detection of emerging trace gases in the atmosphere, a new and powerful analytical method is gas chromatography (GC) of preconcentrated samples, followed by electron ionisation, high resolution mass spectrometry (EI-HRMS). In this work, we present data analysis tools to enable automated fragment formula annotation for unknown compounds measured by GC-EI-HRMS.
Based on co-eluting mass/charge fragments, we developed an innovative data analysis method to reliably reconstruct the chemical formulae of the fragments, using efficient combinatorics and graph theory. The method does not require the presence of the molecular ion, which is absent in [Formula: see text]40% of EI spectra. Our method has been trained and validated on >50 halocarbons and hydrocarbons, with 3-20 atoms and molar masses of 30-330 g mol[Formula: see text], measured with a mass resolution of approx. 3500. For >90% of the compounds, more than 90% of the annotated fragment formulae are correct. Cases of wrong identification can be attributed to the scarcity of detected fragments per compound or the lack of isotopic constraint (no minor isotopocule detected).
Our method enables to reconstruct most probable chemical formulae independently from spectral databases. Therefore, it demonstrates the suitability of EI-HRMS data for non-target analysis and paves the way for the identification of substances for which no EI mass spectrum is registered in databases. We illustrate the performances of our method for atmospheric trace gases and suggest that it may be well suited for many other types of samples. The L-GPL licenced Python code is released under the name ALPINAC for ALgorithmic Process for Identification of Non-targeted Atmospheric Compounds.
非目标筛查旨在对样本中的所有已知、疑似或未知物质进行搜索,而对样本的先验了解非常少。这种方法在十多年前就已引入水分析领域,并配备了专门的化合物识别工具,但在室内和大气痕量气体测量中仍然非常少见,尽管显然需要更好地了解大气痕量气体的组成。为了系统地检测大气中新兴的痕量气体,一种新的强大分析方法是对预浓缩样品进行气相色谱(GC)分析,然后进行电子电离、高分辨率质谱分析(EI-HRMS)。在这项工作中,我们展示了数据分析工具,以实现对通过GC-EI-HRMS测量的未知化合物的自动碎片分子式注释。
基于共洗脱的质荷比碎片,我们开发了一种创新的数据分析方法,利用高效的组合数学和图论可靠地重建碎片的化学式。该方法不需要分子离子的存在,而在40%的EI光谱中不存在分子离子。我们的方法已在50多种卤代烃和烃类化合物上进行了训练和验证,这些化合物含有3至20个原子,摩尔质量为30至330 g/mol,测量的质量分辨率约为3500。对于90%以上的化合物,超过90%的注释碎片分子式是正确的。错误识别的情况可归因于每个化合物检测到的碎片稀少或缺乏同位素约束(未检测到次要同位素分子)。
我们的方法能够独立于光谱数据库重建最可能的化学式。因此,它证明了EI-HRMS数据适用于非目标分析,并为识别数据库中未记录EI质谱的物质铺平了道路。我们展示了我们的方法对大气痕量气体的性能,并表明它可能非常适用于许多其他类型的样品。以L-GPL许可的Python代码以ALPINAC(用于识别非目标大气化合物的算法过程)的名称发布。