Rainer Johannes, Vicini Andrea, Salzer Liesa, Stanstrup Jan, Badia Josep M, Neumann Steffen, Stravs Michael A, Verri Hernandes Vinicius, Gatto Laurent, Gibb Sebastian, Witting Michael
Institute for Biomedicine (Affiliated to the University of Lübeck), Eurac Research, 39100 Bozen, Italy.
Research Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, 85764 Neuherberg, Germany.
Metabolites. 2022 Feb 11;12(2):173. doi: 10.3390/metabo12020173.
Liquid chromatography-mass spectrometry (LC-MS)-based untargeted metabolomics experiments have become increasingly popular because of the wide range of metabolites that can be analyzed and the possibility to measure novel compounds. LC-MS instrumentation and analysis conditions can differ substantially among laboratories and experiments, thus resulting in non-standardized datasets demanding customized annotation workflows. We present an ecosystem of R packages, centered around the , and packages that together provide a modular infrastructure for the annotation of untargeted metabolomics data. Initial annotation can be performed based on MS properties such as and retention times, followed by an MS-based annotation in which experimental fragment spectra are compared against a reference library. Such reference databases can be created and managed with the package. The ecosystem supports data from a variety of formats, including, but not limited to, MSP, MGF, mzML, mzXML, netCDF as well as MassBank text files and SQL databases. Through its highly customizable functionality, the presented infrastructure allows to build reproducible annotation workflows tailored for and adapted to most untargeted LC-MS-based datasets. All core functionality, which supports base R data types, is exported, also facilitating its re-use in other R packages. Finally, all packages are thoroughly unit-tested and documented and are available on GitHub and through Bioconductor.
基于液相色谱-质谱联用(LC-MS)的非靶向代谢组学实验越来越受欢迎,因为其能够分析的代谢物范围广泛,并且有可能测量新化合物。不同实验室和实验中的LC-MS仪器设备及分析条件可能存在很大差异,从而导致数据集缺乏标准化,需要定制化的注释工作流程。我们展示了一个R包生态系统,该系统以、和包为核心,共同为非靶向代谢组学数据的注释提供模块化基础设施。初始注释可基于质荷比和保留时间等质谱属性进行,随后是基于质谱的注释,即将实验性碎片谱与参考库进行比较。此类参考数据库可使用包来创建和管理。该生态系统支持多种格式的数据,包括但不限于MSP、MGF、mzML、mzXML、netCDF以及MassBank文本文件和SQL数据库。通过其高度可定制的功能,所展示的基础设施能够构建针对大多数基于LC-MS的非靶向数据集量身定制且适配的可重复注释工作流程。所有支持基础R数据类型的核心功能都已导出,这也便于在其他R包中重复使用。最后,所有包都经过了全面的单元测试和文档记录,可在GitHub上获取,并通过Bioconductor提供。