在基于气相色谱-质谱联用（GC/MS）的代谢组学中，对多个文库进行检索如何导致有偏差的结果。

How searching against multiple libraries can lead to biased results in GC/MS-based metabolomics.

作者信息

Samokhin Andrey S, Matyushin Dmitriy D

机构信息

Chemistry Department, Lomonosov Moscow State University, Moscow, Russia.

A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, Moscow, Russia.

出版信息

Rapid Commun Mass Spectrom. 2023 Feb 15;37(3):e9437. doi: 10.1002/rcm.9437.

DOI:10.1002/rcm.9437

PMID:36409456

Abstract

RATIONALE

Databases of electron ionization mass spectra are often used in GC/MS-based untargeted metabolomics analysis. The results of the library search depend on several factors, such as the size and quality of the database, and the library search algorithm. We found out that the list of considered m/z values is another important parameter. Unfortunately, this information is not usually specified by software developers and it is hidden from the end user.

METHODS

We created synthetic data sets and figured out how several popular software products (AMDIS, ChromaTOF, MS Search, and Xcalibur) select the list of m/z values for the library search. Moreover, we considered data sets of real mass spectra (presented in both the NIST and FiehnLib libraries) and compared the library search results obtained within different software products. All programs under consideration use the NIST MS Search binaries to perform the library search using the Identity algorithm.

RESULTS

We found that AMDIS and ChromaTOF can give biased library search results under particular conditions. In untargeted metabolomics, this can happen when NIST and FiehnLib libraries are used simultaneously, the scan range of the instrument is less than 85, and the correct answer is present only in the FiehnLib library.

CONCLUSIONS

The main reason for biased results is that the information about the scan range is not stored in the metadata of library records. As a result, in the case of AMDIS and ChromaTOF software, some unrecorded peaks are considered as missing during the library search, the respective compound is penalized, and the correct answer falls outside the top five or even top 10 hits. At the same time, the default algorithm for selecting the list of considered m/z values implemented in MS Search is free from such unexpected behavior.

摘要

原理

电子电离质谱数据库常用于基于气相色谱/质谱联用的非靶向代谢组学分析。库检索的结果取决于几个因素，如数据库的大小和质量以及库检索算法。我们发现，所考虑的质荷比（m/z）值列表是另一个重要参数。不幸的是，软件开发人员通常不会指定此信息，最终用户也无法得知。

方法

我们创建了合成数据集，并弄清楚了几种流行软件产品（AMDIS、ChromaTOF、MS Search和Xcalibur）如何选择用于库检索的m/z值列表。此外，我们考虑了真实质谱数据集（NIST库和FiehnLib库中均有呈现），并比较了不同软件产品获得的库检索结果。所有考虑的程序都使用NIST MS Search二进制文件，通过一致性算法进行库检索。

结果

我们发现，在特定条件下，AMDIS和ChromaTOF可能会给出有偏差的库检索结果。在非靶向代谢组学中，当同时使用NIST库和FiehnLib库、仪器的扫描范围小于85且正确答案仅存在于FiehnLib库中时，就可能出现这种情况。

结论

结果出现偏差的主要原因是扫描范围信息未存储在库记录的元数据中。因此，对于AMDIS和ChromaTOF软件，在库检索过程中，一些未记录的峰被视为缺失，相应的化合物会受到惩罚，正确答案不在前五个甚至前十命中结果中。同时，MS Search中实现的用于选择所考虑的m/z值列表的默认算法没有这种意外行为。