Tan Chuen Seng, Ploner Alexander, Quandt Andreas, Lehtiö Janne, Pernemalm Maria, Lewensohn Rolf, Pawitan Yudi
Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden.
Proteomics. 2006 Dec;6(23):6124-33. doi: 10.1002/pmic.200600505.
Peak detection is a key step in the analysis of SELDI-TOF-MS spectra, but the current default method has low specificity and poor peak annotation. To improve data quality, scientists still have to validate the identified peaks visually, a tedious and time-consuming process, especially for large data sets. Hence, there is a genuine need for methods that minimize manual validation. We have previously reported a multi-spectral signal detection method, called RS for 'region of significance', with improved specificity. Here we extend it to include a peak quantification algorithm based on annotated regions of significance (ARS). For each spectral region flagged as significant by RS, we first identify a dominant spectrum for determining the number of peaks and the m/z region of these peaks. From each m/z region of peaks, a peak template is extracted from all spectra via the principal component analysis. Finally, with the template, we estimate the amplitude and location of the peak in each spectrum with the least-squares method and refine the estimation of the amplitude via the mixture model. We have evaluated the ARS algorithm on patient samples from a clinical study. Comparison with the standard method shows that ARS (i) inherits the superior specificity of RS, and (ii) gives more accurate peak annotations than the standard method. In conclusion, we find that ARS alleviates the main problems in the preprocessing of SELDI-TOF spectra. The R-package ProSpect that implements ARS is freely available for academic use at http://www.meb.ki.se/ yudpaw.
峰检测是表面增强激光解吸电离飞行时间质谱(SELDI-TOF-MS)谱图分析中的关键步骤,但当前的默认方法特异性较低且峰注释效果不佳。为提高数据质量,科学家们仍需手动直观验证所识别的峰,这是一个繁琐且耗时的过程,尤其是对于大数据集而言。因此,迫切需要能够尽量减少人工验证的方法。我们之前报道了一种名为“显著区域”(RS)的多光谱信号检测方法,其特异性有所提高。在此,我们对其进行扩展,纳入了一种基于注释显著区域(ARS)的峰定量算法。对于每个被RS标记为显著的光谱区域,我们首先确定一个主导光谱,以确定峰的数量及其质荷比(m/z)区域。从每个峰的m/z区域,通过主成分分析从所有光谱中提取峰模板。最后,利用该模板,我们用最小二乘法估计每个光谱中峰的幅度和位置,并通过混合模型对幅度估计进行优化。我们在一项临床研究的患者样本上评估了ARS算法。与标准方法的比较表明,ARS(i)继承了RS的卓越特异性,并且(ii)比标准方法给出更准确的峰注释。总之,我们发现ARS缓解了SELDI-TOF谱图预处理中的主要问题。实现ARS的R包ProSpect可在http://www.meb.ki.se/ yudpaw免费获取以供学术使用。