Habra Hani, Kachman Maureen, Bullock Kevin, Clish Clary, Evans Charles R, Karnovsky Alla
Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Arbor, Michigan 48109, United States.
Michigan Regional Comprehensive Metabolomics Resource Core, University of Michigan, 1000 Wall Street, Ann Arbor, Michigan 48105, United States.
Anal Chem. 2021 Mar 30;93(12):5028-5036. doi: 10.1021/acs.analchem.0c03693. Epub 2021 Mar 16.
LC-HRMS experiments detect thousands of compounds, with only a small fraction of them identified in most studies. Traditional data processing pipelines contain an alignment step to assemble the measurements of overlapping features across samples into a unified table. However, data sets acquired under nonidentical conditions are not amenable to this process, mostly due to significant alterations in chromatographic retention times. Alignment of features between disparately acquired LC-MS metabolomics data could aid collaborative compound identification efforts and enable meta-analyses of expanded data sets. Here, we describe , a new computational pipeline for matching known and unknown features in a pair of untargeted LC-MS data sets and concatenating their abundances into a combined table of intersecting feature measurements. groups features by mass-to-charge (/) values to generate a search space of possible feature pair alignments, fits a spline through a set of selected retention time ordered pairs, and ranks alignments by /, mapped retention time, and relative abundance similarity. We evaluated this workflow on a pair of plasma metabolomics data sets acquired with different gradient elution methods, achieving a mean absolute retention time prediction error of roughly 0.06 min and a weighted per-compound matching accuracy of approximately 90%. We further demonstrate the utility of this method by comprehensively mapping features in urine and muscle metabolomics data sets acquired from different laboratories. has the potential to bridge the gap between otherwise incompatible metabolomics data sets and is available as an R package at https://github.com/hhabra/metabCombiner and .
液相色谱-高分辨质谱(LC-HRMS)实验可检测到数千种化合物,但在大多数研究中,只有一小部分化合物能被鉴定出来。传统的数据处理流程包含一个比对步骤,即将跨样本的重叠特征测量值整合到一个统一的表格中。然而,在非相同条件下获取的数据集并不适合此过程,主要原因是色谱保留时间存在显著变化。对不同来源的液相色谱-质谱代谢组学数据进行特征比对,有助于协同开展化合物鉴定工作,并能对扩展后的数据集进行荟萃分析。在此,我们描述了一种新的计算流程,用于匹配一对非靶向液相色谱-质谱数据集里已知和未知的特征,并将它们的丰度串联到一个相交特征测量的组合表格中。该流程根据质荷比(/)值对特征进行分组,以生成可能的特征对匹配搜索空间,通过一组选定的保留时间有序对拟合样条曲线,并根据质荷比、映射保留时间和相对丰度相似度对匹配进行排序。我们在一对采用不同梯度洗脱方法获取的血浆代谢组学数据集上评估了此工作流程,实现了约0.06分钟的平均绝对保留时间预测误差和约90%的加权化合物匹配准确率。我们还通过全面映射从不同实验室获取的尿液和肌肉代谢组学数据集中的特征,进一步证明了该方法的实用性。该流程有潜力弥合原本不兼容的代谢组学数据集之间的差距,可作为R包在https://github.com/hhabra/metabCombiner获取。