Department of Biotechnology, Graduate School of Engineering, Osaka University , 2-1 Yamadaoka, Suita, Osaka 565-0871, Japan.
RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan.
Anal Chem. 2017 Jun 20;89(12):6766-6773. doi: 10.1021/acs.analchem.7b01010. Epub 2017 May 26.
Compound identification using unknown electron ionization (EI) mass spectra in gas chromatography coupled with mass spectrometry (GC-MS) is challenging in untargeted metabolomics, natural product chemistry, or exposome research. While the total count of EI-MS records included in publicly or commercially available databases is over 900 000, efficient use of this huge database has not been achieved in metabolomics. Therefore, we proposed a "four-step" strategy for the identification of biologically significant metabolites using an integrated cheminformatics approach: (i) quality control calibration curve to reduce background noise, (ii) variable selection by hypothesis testing in principal component analysis for the efficient selection of target peaks, (iii) searching the EI-MS spectral database, and (iv) retention index (RI) filtering in combination with RI predictions. In this study, the new MS-FINDER spectral search engine was developed and utilized for searching EI-MS databases using mass spectral similarity with the evaluation of false discovery rate. Moreover, in silico derivatization software, MetaboloDerivatizer, was developed to calculate the chemical properties of derivative compounds, and all retention indexes in EI-MS databases were predicted using a simple mathematical model. The strategy was showcased in the identification of three novel metabolites (butane-1,2,3-triol, 3-deoxyglucosone, and palatinitol) in Chinese medicine Senkyu for quality assessment, as validated using authentic standard compounds. All tools and curated public EI-MS databases are freely available in the 'Computational MS-based metabolomics' section of the RIKEN PRIMe Web site ( http://prime.psc.riken.jp ).
在非靶向代谢组学、天然产物化学或暴露组学研究中,使用气相色谱-质谱联用 (GC-MS) 中的未知电子电离 (EI) 质谱图进行化合物鉴定具有挑战性。虽然公开或商业可用数据库中包含的 EI-MS 记录总数超过 90 万,但在代谢组学中尚未实现对这个庞大数据库的有效利用。因此,我们提出了一种使用集成化计算化学方法鉴定具有生物学意义的代谢物的“四步”策略:(i)质量控制校准曲线以减少背景噪声,(ii)主成分分析中的假设检验进行变量选择,以有效地选择目标峰,(iii)搜索 EI-MS 光谱数据库,(iv)与 RI 预测相结合的保留指数 (RI) 过滤。在本研究中,开发了新的 MS-FINDER 光谱搜索引擎,并利用其基于质谱相似度的 EI-MS 数据库搜索功能,同时评估假发现率。此外,还开发了用于计算衍生化合物化学性质的虚拟衍生化软件 MetaboloDerivatizer,并使用简单的数学模型预测 EI-MS 数据库中的所有保留指数。该策略在鉴定中药参芎中三种新型代谢物(1,2,3-丁三醇、3-脱氧葡萄糖酮和异麦芽酮醇)的质量评估中得到了展示,并用真实标准化合物进行了验证。所有工具和经过整理的公共 EI-MS 数据库都可在 RIKEN PRIMe 网站的“基于 MS 的计算代谢组学”部分(http://prime.psc.riken.jp)免费获得。