Stolt Ragnar, Torgrip Ralf J O, Lindberg Johan, Csenki Leonard, Kolmert Johan, Schuppe-Koistinen Ina, Jacobsson Sven P
Department of Analytical Chemistry, BioSysteMetrics Group, Stockholm University, SE-106 91 Stockholm, Sweden.
Anal Chem. 2006 Feb 15;78(4):975-83. doi: 10.1021/ac050980b.
The first step when analyzing multicomponent LC/MS data from complex samples such as biofluid metabolic profiles is to separate the data into information and noise via, for example, peak detection. Due to the complex nature of this type of data, with problems such as alternating backgrounds and differing peak shapes, this can be a very complex task. This paper presents and evaluates a two-dimensional peak detection algorithm based on raw vector-represented LC/MS data. The algorithm exploits the fact that in high-resolution centroid data chromatographic peaks emerge flanked with data voids in the corresponding mass axis. According to the proposed method, only 4 per thousand of the total amount of data from a urine sample is defined as chromatographic peaks; however, 94% of the raw data variance is captured within these peaks. Compared to bucketed data, results show that essentially the same features that an experienced analyst would define as peaks can automatically be extracted with a minimum of noise and background. The method is simple and requires a priori knowledge of only the minimum chromatographic peak width-a system-dependent parameter that is easily assessed. Additional meta parameters are estimated from the data themselves. The result is well-defined chromatographic peaks that are consistently arranged in a matrix at their corresponding m/z values. In the context of automated analysis, the method thus provides an alternative to the traditional approach of bucketing the data followed by denoising and/or one-dimensional peak detection. The software implementation of the proposed algorithm is available at http://www.anchem.su.se/peakd as compiled code for Matlab.
分析来自复杂样本(如生物流体代谢谱)的多组分液相色谱-质谱联用(LC/MS)数据时,第一步是通过例如峰检测将数据分离为信息和噪声。由于这类数据的复杂性,存在诸如背景交替和峰形各异等问题,这可能是一项非常复杂的任务。本文提出并评估了一种基于原始向量表示的LC/MS数据的二维峰检测算法。该算法利用了这样一个事实,即在高分辨率质心数据中,色谱峰在相应的质量轴上两侧会出现数据空白。根据所提出的方法,尿液样本中只有千分之四的数据总量被定义为色谱峰;然而,这些峰内捕获了94%的原始数据方差。与分桶数据相比,结果表明,经验丰富的分析师会定义为峰的基本相同特征能够以最少的噪声和背景自动提取出来。该方法简单,仅需要关于最小色谱峰宽度的先验知识——这是一个与系统相关且易于评估的参数。其他元参数从数据本身进行估计。结果是定义明确的色谱峰,它们在相应的质荷比(m/z)值处一致地排列在一个矩阵中。在自动分析的背景下,该方法因此为传统的数据分桶然后去噪和/或一维峰检测方法提供了一种替代方案。所提出算法的软件实现可在http://www.anchem.su.se/peakd获取,作为Matlab的编译代码。