Huan Tao, Li Liang
Department of Chemistry, University of Alberta , Edmonton, Alberta T6G2G2, Canada.
Anal Chem. 2015 Jan 20;87(2):1306-13. doi: 10.1021/ac5039994. Epub 2014 Dec 30.
Metabolomics requires quantitative comparison of individual metabolites present in an entire sample set. Unfortunately, missing intensity values in one or more samples are very common. Because missing values can have a profound influence on metabolomic results, the extent of missing values found in a metabolomic data set should be treated as an important parameter for measuring the analytical performance of a technique. In this work, we report a study on the scope of missing values and a robust method of filling the missing values in a chemical isotope labeling (CIL) LC-MS metabolomics platform. Unlike conventional LC-MS, CIL LC-MS quantifies the concentration differences of individual metabolites in two comparative samples based on the mass spectral peak intensity ratio of a peak pair from a mixture of differentially labeled samples. We show that this peak-pair feature can be explored as a unique means of extracting metabolite intensity information from raw mass spectra. In our approach, a peak-pair peaking algorithm, IsoMS, is initially used to process the LC-MS data set to generate a CSV file or table that contains metabolite ID and peak ratio information (i.e., metabolite-intensity table). A zero-fill program, freely available from MyCompoundID.org , is developed to automatically find a missing value in the CSV file and go back to the raw LC-MS data to find the peak pair and, then, calculate the intensity ratio and enter the ratio value into the table. Most of the missing values are found to be low abundance peak pairs. We demonstrate the performance of this method in analyzing an experimental and technical replicate data set of human urine metabolome. Furthermore, we propose a standardized approach of counting missing values in a replicate data set as a way of gauging the extent of missing values in a metabolomics platform. Finally, we illustrate that applying the zero-fill program, in conjunction with dansylation CIL LC-MS, can lead to a marked improvement in finding significant metabolites that differentiate bladder cancer patients and their controls in a metabolomics study of 109 subjects.
代谢组学需要对整个样本集中存在的单个代谢物进行定量比较。不幸的是,一个或多个样本中缺失强度值的情况非常普遍。由于缺失值会对代谢组学结果产生深远影响,因此代谢组学数据集中发现的缺失值程度应被视为衡量一种技术分析性能的重要参数。在这项工作中,我们报告了一项关于缺失值范围的研究以及一种在化学同位素标记(CIL)液相色谱 - 质谱代谢组学平台中填补缺失值的稳健方法。与传统液相色谱 - 质谱不同,CIL液相色谱 - 质谱基于差异标记样本混合物中峰对的质谱峰强度比来量化两个比较样本中单个代谢物的浓度差异。我们表明,这种峰对特征可以作为从原始质谱中提取代谢物强度信息的独特方法进行探索。在我们的方法中,首先使用峰对峰值算法IsoMS处理液相色谱 - 质谱数据集,以生成包含代谢物ID和峰比信息的CSV文件或表格(即代谢物 - 强度表)。开发了一个可从MyCompoundID.org免费获取的零填充程序,用于自动在CSV文件中找到缺失值,然后返回原始液相色谱 - 质谱数据中找到峰对,接着计算强度比并将比值输入表格。发现大多数缺失值是低丰度峰对。我们展示了该方法在分析人类尿液代谢组的实验和技术重复数据集方面的性能。此外,我们提出了一种在重复数据集中计算缺失值的标准化方法,作为衡量代谢组学平台中缺失值程度的一种方式。最后,我们说明将零填充程序与丹磺酰化CIL液相色谱 - 质谱结合使用,可以在一项对109名受试者的代谢组学研究中显著改善发现区分膀胱癌患者及其对照组显著代谢物的能力。