Department of Biostatistics and Bioinformatics, Rollins School of Public Health and Department of Medicine, School of Medicine, Emory University, Atlanta, GA 30322, USA.
Bioinformatics. 2014 Oct 15;30(20):2941-8. doi: 10.1093/bioinformatics/btu430. Epub 2014 Jul 7.
Peak detection is a key step in the preprocessing of untargeted metabolomics data generated from high-resolution liquid chromatography-mass spectrometry (LC/MS). The common practice is to use filters with predetermined parameters to select peaks in the LC/MS profile. This rigid approach can cause suboptimal performance when the choice of peak model and parameters do not suit the data characteristics.
Here we present a method that learns directly from various data features of the extracted ion chromatograms (EICs) to differentiate between true peak regions from noise regions in the LC/MS profile. It utilizes the knowledge of known metabolites, as well as robust machine learning approaches. Unlike currently available methods, this new approach does not assume a parametric peak shape model and allows maximum flexibility. We demonstrate the superiority of the new approach using real data. Because matching to known metabolites entails uncertainties and cannot be considered a gold standard, we also developed a probabilistic receiver-operating characteristic (pROC) approach that can incorporate uncertainties.
The new peak detection approach is implemented as part of the apLCMS package available at http://web1.sph.emory.edu/apLCMS/ CONTACT: tyu8@emory.edu
Supplementary data are available at Bioinformatics online.
峰检测是从高分辨率液相色谱-质谱 (LC/MS) 生成的非靶向代谢组学数据预处理的关键步骤。常见的做法是使用具有预定参数的滤波器来选择 LC/MS 图谱中的峰。当峰模型和参数的选择不适用于数据特征时,这种刚性方法可能会导致性能不佳。
在这里,我们提出了一种直接从提取离子色谱图 (EIC) 的各种数据特征中学习的方法,以区分 LC/MS 图谱中真实峰区域和噪声区域。它利用了已知代谢物的知识以及强大的机器学习方法。与现有的方法不同,这种新方法不假设参数峰形模型,并允许最大的灵活性。我们使用真实数据证明了新方法的优越性。由于与已知代谢物的匹配存在不确定性,不能将其视为黄金标准,因此我们还开发了一种可以包含不确定性的概率接收者操作特征 (pROC) 方法。
新的峰检测方法作为可从 http://web1.sph.emory.edu/apLCMS/ 获取的 apLCMS 包的一部分实现
补充数据可在生物信息学在线获得。