Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health , Baltimore, Maryland 21205, United States.
Center for Alternatives to Animal Testing (CAAT), Department of Environmental Health and Engineering, Johns Hopkins Bloomberg School of Public Health , Baltimore, Maryland 21205, United States.
Anal Chem. 2017 Mar 21;89(6):3517-3523. doi: 10.1021/acs.analchem.6b04719. Epub 2017 Mar 7.
As mass spectrometry-based metabolomics becomes more widely used in biomedical research, it is important to revisit existing data analysis paradigms. Existing data preprocessing efforts have largely focused on methods which start by extracting features separately from each sample, followed by a subsequent attempt to group features across samples to facilitate comparisons. We show that this preprocessing approach leads to unnecessary variability in peak quantifications that adversely impacts downstream analysis. We present a new method, bakedpi, for the preprocessing of both centroid and profile mode metabolomics data that relies on an intensity-weighted bivariate kernel density estimation on a pooling of all samples to detect peaks. This new method reduces this unnecessary quantification variability and increases power in downstream differential analysis.
随着基于质谱的代谢组学在生物医学研究中得到更广泛的应用,重新审视现有的数据分析范式变得尤为重要。现有的数据预处理工作主要集中在从每个样本中分别提取特征的方法上,然后试图在样本之间对特征进行分组,以方便比较。我们表明,这种预处理方法会导致峰定量的不必要的可变性,从而对下游分析产生不利影响。我们提出了一种新的方法,bakedpi,用于处理质心法和轮廓法代谢组学数据,该方法依赖于对所有样本的混合进行加权双变量核密度估计来检测峰。这种新方法减少了这种不必要的定量可变性,并提高了下游差异分析的功效。