Shin Byunghee, Jung Hee-Jung, Hyung Seok-Won, Kim Hokeun, Lee Dongkyu, Lee Cheolju, Yu Myeong-Hee, Lee Sang-Won
Functional Proteomics Center, Life Sciences Division, Korea Institute of Science and Technology, 136-791, Hawalgok-dong, Seongbuk-gu, Seoul 130-650, Republic of Korea.
Mol Cell Proteomics. 2008 Jun;7(6):1124-34. doi: 10.1074/mcp.M700419-MCP200. Epub 2008 Feb 25.
Methods for treating MS/MS data to achieve accurate peptide identification are currently the subject of much research activity. In this study we describe a new method for filtering MS/MS data and refining precursor masses that provides highly accurate analyses of massive sets of proteomics data. This method, coined "postexperiment monoisotopic mass filtering and refinement" (PE-MMR), consists of several data processing steps: 1) generation of lists of all monoisotopic masses observed in a whole LC/MS experiment, 2) clusterization of monoisotopic masses of a peptide into unique mass classes (UMCs) based on their masses and LC elution times, 3) matching the precursor masses of the MS/MS data to a representative mass of a UMC, and 4) filtration of the MS/MS data based on the presence of corresponding monoisotopic masses and refinement of the precursor ion masses by the UMC mass. PE-MMR increases the throughput of proteomics data analysis, by efficiently removing "garbage" MS/MS data prior to database searching, and improves the mass measurement accuracies (i.e. 0.05 +/- 1.49 ppm for yeast data (from 4.46 +/- 2.81 ppm) and 0.03 +/- 3.41 ppm for glycopeptide data (from 4.8 +/- 7.4 ppm)) for an increased number of identified peptides. In proteomics analyses of glycopeptide-enriched samples, PE-MMR processing greatly reduces the degree of false glycopeptide identification by correctly assigning the monoisotopic masses for the precursor ions prior to database searching. By applying this technique to analyses of proteome samples of varying complexities, we demonstrate herein that PE-MMR is an effective and accurate method for treating massive sets of proteomics data.
目前,用于处理质谱/质谱(MS/MS)数据以实现准确肽段鉴定的方法是众多研究活动的主题。在本研究中,我们描述了一种用于过滤MS/MS数据和优化前体质量的新方法,该方法能对大量蛋白质组学数据进行高度准确的分析。这种方法被称为“实验后单同位素质量过滤与优化”(PE-MMR),它由几个数据处理步骤组成:1)生成整个液相色谱/质谱(LC/MS)实验中观察到的所有单同位素质量列表;2)根据肽段的质量和LC洗脱时间,将肽段的单同位素质量聚类为独特质量类别(UMC);3)将MS/MS数据的前体质量与UMC的代表性质量进行匹配;4)基于相应单同位素质量的存在情况过滤MS/MS数据,并通过UMC质量优化前体离子质量。PE-MMR通过在数据库搜索之前有效去除“垃圾”MS/MS数据,提高了蛋白质组学数据分析的通量,并提高了质量测量精度(即酵母数据的精度从4.46±2.81 ppm提高到0.05±1.49 ppm,糖肽数据的精度从4.8±7.4 ppm提高到0.03±3.41 ppm),从而增加了鉴定出的肽段数量。在富含糖肽的样品的蛋白质组学分析中,PE-MMR处理通过在数据库搜索之前正确分配前体离子的单同位素质量,大大降低了假糖肽鉴定的程度。通过将该技术应用于不同复杂程度的蛋白质组样品分析,我们在此证明PE-MMR是一种处理大量蛋白质组学数据的有效且准确的方法。