Zhang Shuqin, Wang Honghui, Zhou Xiaobo, Hoehn Gerard T, DeGraba Thomas J, Gonzales Denise A, Suffredini Anthony F, Ching Wai-Ki, Ng Michael K, Wong Stephen T C
HCNR-CBI, Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA.
Proteomics. 2009 Aug;9(15):3833-42. doi: 10.1002/pmic.200800030.
Peak detection is a pivotal first step in biomarker discovery from MS data and can significantly influence the results of downstream data analysis steps. We developed a novel automatic peak detection method for prOTOF MS data, which does not require a priori knowledge of protein masses. Random noise is removed by an undecimated wavelet transform and chemical noise is attenuated by an adaptive short-time discrete Fourier transform. Isotopic peaks corresponding to a single protein are combined by extracting an envelope over them. Depending on the S/N, the desired peaks in each individual spectrum are detected and those with the highest intensity among their peak clusters are recorded. The common peaks among all the spectra are identified by choosing an appropriate cut-off threshold in the complete linkage hierarchical clustering. To remove the 1 Da shifting of the peaks, the peak corresponding to the same protein is determined as the detected peak with the largest number among its neighborhood. We validated this method using a data set of serial peptide and protein calibration standards. Compared with MoverZ program, our new method detects more peaks and significantly enhances S/N of the peak after the chemical noise removal. We then successfully applied this method to a data set from prOTOF MS spectra of albumin and albumin-bound proteins from serum samples of 59 patients with carotid artery disease compared to vascular disease-free patients to detect peaks with S/N> or =2. Our method is easily implemented and is highly effective to define peaks that will be used for disease classification or to highlight potential biomarkers.
峰检测是从质谱数据中发现生物标志物的关键第一步,并且会显著影响下游数据分析步骤的结果。我们为正离子模式飞行时间质谱(prOTOF MS)数据开发了一种新型自动峰检测方法,该方法不需要蛋白质质量的先验知识。通过非下采样小波变换去除随机噪声,并通过自适应短时离散傅里叶变换减弱化学噪声。通过提取单个蛋白质对应的同位素峰的包络线来合并这些同位素峰。根据信噪比,检测每个单独光谱中的目标峰,并记录其峰簇中强度最高的峰。通过在完全连锁层次聚类中选择合适的截止阈值来识别所有光谱中的共同峰。为了消除峰的1 Da偏移,将与其相邻区域中数量最多的检测峰确定为同一蛋白质对应的峰。我们使用一系列肽和蛋白质校准标准品数据集对该方法进行了验证。与MoverZ程序相比,我们的新方法检测到更多的峰,并且在去除化学噪声后显著提高了峰的信噪比。然后,我们成功地将该方法应用于来自59例颈动脉疾病患者与无血管疾病患者血清样本的白蛋白和白蛋白结合蛋白的prOTOF MS光谱数据集,以检测信噪比≥2的峰。我们的方法易于实施,对于定义用于疾病分类或突出潜在生物标志物的峰非常有效。