Leiden/Amsterdam Center for Drug Research (LACDR), Leiden University, P.O. Box 9502, NL-2300 RA Leiden, The Netherlands.
Anal Chem. 2010 Feb 1;82(3):1039-46. doi: 10.1021/ac902346a.
Combination of data sets from different objects (for example, from two groups of healthy volunteers from the same population) that were measured on a common set of variables (for example, metabolites or peptides) is desirable for statistical analysis in "omics" studies because it increases power. However, this type of combination is not directly possible if nonbiological systematic differences exist among the individual data sets, or "blocks". Such differences can, for example, be due to small analytical changes that are likely to accumulate over large time intervals between blocks of measurements. In this article we present a data transformation method, that we will refer to as "quantile equating", which per variable corrects for linear and nonlinear differences in distribution among blocks of semiquantitative data obtained with the same analytical method. We demonstrate the successful application of the quantile equating method to data obtained on two typical metabolomics platforms, i.e., liquid chromatography-mass spectrometry and nuclear magnetic resonance spectroscopy. We suggest uni- and multivariate methods to evaluate similarities and differences among data blocks before and after quantile equating. In conclusion, we have developed a method to correct for nonbiological systematic differences among semiquantitative data blocks and have demonstrated its successful application to metabolomics data sets.
组合来自不同对象的数据(例如,来自同一人群的两组健康志愿者),这些数据是在共同的变量集(例如代谢物或肽)上测量的,这在“组学”研究中进行统计分析是理想的,因为它可以提高功效。然而,如果个体数据集(或“块”)之间存在非生物学系统差异,则不能直接进行这种组合。这种差异可能是由于在测量块之间的大时间间隔内可能积累的小分析变化引起的。在本文中,我们提出了一种数据转换方法,我们将其称为“分位数均等化”,该方法针对使用相同分析方法获得的半定量数据块之间分布的线性和非线性差异,对每个变量进行校正。我们证明了分位数均等化方法在两种典型代谢组学平台(即液相色谱-质谱和核磁共振波谱)上获得的数据中的成功应用。我们建议使用单变量和多变量方法来评估分位数均等化前后数据块之间的相似性和差异。总之,我们已经开发出一种方法来纠正半定量数据块之间的非生物学系统差异,并已证明其在代谢组学数据集上的成功应用。