Hsueh Huey-Miin, Kuo Hsun-Chih, Tsai Chen-An
Department of Statistics, National Cheng-Chi University, Taipei, Taiwan.
J Biopharm Stat. 2008;18(5):869-82. doi: 10.1080/10543400802278064.
An important objective in mass spectrometry (MS) is to identify a set of biomarkers that can be used to potentially distinguish patients between distinct treatments (or conditions) from tens or hundreds of spectra. A common two-step approach involving peak extraction and quantification is employed to identify the features of scientific interest. The selected features are then used for further investigation to understand underlying biological mechanism of individual protein or for development of genomic biomarkers to early diagnosis. However, the use of inadequate or ineffective peak detection and peak alignment algorithms in peak extraction step may lead to a high rate of false positives. Also, it is crucial to reduce the false positive rate in detecting biomarkers from ten or hundreds of spectra. Here a new procedure is introduced for feature extraction in mass spectrometry data that extends the continuous wavelet transform-based (CWT-based) algorithm to multiple spectra. The proposed multispectra CWT-based algorithm (MCWT) not only can perform peak detection for multiple spectra but also carry out peak alignment at the same time. The author' MCWT algorithm constructs a reference, which integrates information of multiple raw spectra, for feature extraction. The algorithm is applied to a SELDI-TOF mass spectra data set provided by CAMDA 2006 with known polypeptide m/z positions. This new approach is easy to implement and it outperforms the existing peak extraction method from the Bioconductor PROcess package.
质谱分析(MS)的一个重要目标是从数十个或数百个光谱中识别出一组生物标志物,这些标志物可用于潜在地区分不同治疗方法(或病症)的患者。一种常见的两步法,包括峰提取和定量,用于识别感兴趣的科学特征。然后,将所选特征用于进一步研究,以了解单个蛋白质的潜在生物学机制,或用于开发用于早期诊断的基因组生物标志物。然而,在峰提取步骤中使用不足或无效的峰检测和峰对齐算法可能会导致高假阳性率。此外,在从数十个或数百个光谱中检测生物标志物时降低假阳性率至关重要。本文介绍了一种用于质谱数据特征提取的新程序,该程序将基于连续小波变换(CWT)的算法扩展到多个光谱。所提出的基于多光谱CWT的算法(MCWT)不仅可以对多个光谱进行峰检测,还可以同时进行峰对齐。作者的MCWT算法构建了一个参考,该参考整合了多个原始光谱的信息,用于特征提取。该算法应用于由CAMDA 2006提供的具有已知多肽m/z位置的SELDI-TOF质谱数据集。这种新方法易于实现,并且优于来自Bioconductor PROcess包的现有峰提取方法。