Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, New York 10029, United States.
Non-communicable Diseases Division, Translational Health Science and Technology Institute, Faridabad, Haryana 121001, India.
Anal Chem. 2022 Oct 4;94(39):13315-13322. doi: 10.1021/acs.analchem.2c00563. Epub 2022 Sep 22.
Untargeted liquid chromatography/high-resolution mass spectrometry (LC/HRMS) assays in metabolomics and exposomics aim to characterize the small molecule chemical space in a biospecimen. To gain maximum biological insights from these data sets, LC/HRMS peaks should be annotated with chemical and functional information including molecular formula, structure, chemical class, and metabolic pathways. Among these, molecular formulas may be assigned to LC/HRMS peaks through matching theoretical and observed isotopic profiles (MS1) of the underlying ionized compound. For this, we have developed the Integrated Data Science Laboratory for Metabolomics and Exposomics-United Formula Annotation (IDSL.UFA) R package. In the untargeted metabolomics validation tests, IDSL.UFA assigned 54.31-85.51% molecular formula for true positive annotations as the top hit and 90.58-100% within the top five hits. Molecular formula annotations were also supported by tandem mass spectrometry data. We have implemented new strategies to (1) generate formula sources and their theoretical isotopic profiles, (2) optimize the formula hits ranking for the individual and aligned peak lists, and (3) scale IDSL.UFA-based workflows for studies with larger sample sizes. Annotating the raw data for a publicly available pregnancy metabolome study using IDSL.UFA highlighted hundreds of new pregnancy-related compounds and also suggested the presence of chlorinated perfluorotriether alcohols (Cl-PFTrEAs) in human specimens. IDSL.UFA is useful for human metabolomics and exposomics studies where we need to minimize the loss of biological insights in untargeted LC/HRMS data sets. The IDSL.UFA package is available in the R CRAN repository https://cran.r-project.org/package=IDSL.UFA. Detailed documentation and tutorials are also provided at www.ufa.idsl.me.
非靶向液相色谱/高分辨率质谱 (LC/HRMS) 分析方法在代谢组学和暴露组学中旨在描述生物样本中小分子的化学空间。为了从这些数据集获得最大的生物学见解,LC/HRMS 峰应该用化学和功能信息进行注释,包括分子公式、结构、化学类别和代谢途径。在这些信息中,分子公式可以通过匹配基础离子化化合物的理论和观察到的同位素分布 (MS1) 来分配给 LC/HRMS 峰。为此,我们开发了代谢组学和暴露组学综合数据科学实验室-联合公式注释 (IDSL.UFA) R 包。在非靶向代谢组学验证测试中,IDSL.UFA 将 54.31-85.51%的分子公式作为最准确的注释分配给真实阳性注释,前五个命中中有 90.58-100%。分子公式注释也得到串联质谱数据的支持。我们实施了新策略来 (1) 生成公式来源及其理论同位素分布,(2) 优化个体和对齐峰列表的公式命中排名,以及 (3) 为具有更大样本量的研究扩展 IDSL.UFA 基于的工作流程。使用 IDSL.UFA 对公开可用的妊娠代谢组学研究的原始数据进行注释突出了数百种新的与妊娠相关的化合物,还表明人类标本中存在氯代全氟三乙醚醇 (Cl-PFTrEAs)。IDSL.UFA 可用于人类代谢组学和暴露组学研究,我们需要在非靶向 LC/HRMS 数据集中最大限度地减少生物见解的损失。IDSL.UFA 包可在 R CRAN 存储库 https://cran.r-project.org/package=IDSL.UFA 中获得。详细的文档和教程也可在 www.ufa.idsl.me 上获得。