Shen Yufeng, Tolić Nikola, Hixson Kim K, Purvine Samuel O, Pasa-Tolić Ljiljana, Qian Wei-Jun, Adkins Joshua N, Moore Ronald J, Smith Richard D
Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, USA.
Anal Chem. 2008 Mar 15;80(6):1871-82. doi: 10.1021/ac702328x. Epub 2008 Feb 14.
Identifying proteins and their modification states and with known levels of confidence remains as a significant challenge for proteomics. Random or decoy peptide databases are increasingly being used to estimate the false discovery rate (FDR), e.g., from liquid chromatography-tandem mass spectrometry (LC-MS/MS) analyses of tryptic digests. We show that this approach can significantly underestimate the FDR and describe an approach for more confident protein identifications that uses unique partial sequences derived from a combination of database searching and amino acid residue sequencing using high-accuracy MS/MS data. Applied to a Saccharomyces cerevisiae tryptic digest, the approach provided 3 132 confident peptide identifications ( approximately 5% modified in some fashion), covering 575 proteins with an estimated zero FDR. The conventional approach provided 3 359 peptide identifications and 656 proteins with 0.3% FDR based upon a decoy database analysis. However, the present approach revealed approximately 5% of the 3 359 identifications to be incorrect and many more as potentially ambiguous (e.g., due to not considering certain amino acid substitutions and modifications). In addition, 677 peptides and 39 proteins were identified that had been missed by conventional analysis, including nontryptic peptides, peptides with a variety of expected/unexpected chemical modifications, known/unknown post-translational modifications, single nucleotide polymorphisms or gene encoding errors, and multiple modifications of individual peptides.
识别蛋白质及其修饰状态并确定其可信度水平,仍然是蛋白质组学面临的重大挑战。随机或诱饵肽数据库越来越多地用于估计错误发现率(FDR),例如,从胰蛋白酶消化产物的液相色谱 - 串联质谱(LC-MS/MS)分析中进行估计。我们表明,这种方法可能会显著低估FDR,并描述了一种用于更可靠蛋白质鉴定的方法,该方法使用从数据库搜索和使用高精度MS/MS数据的氨基酸残基测序相结合中获得的独特部分序列。应用于酿酒酵母胰蛋白酶消化产物时,该方法提供了3132个可靠的肽段鉴定结果(约5%以某种方式被修饰),覆盖了575种蛋白质,估计错误发现率为零。基于诱饵数据库分析,传统方法提供了3359个肽段鉴定结果和656种蛋白质,错误发现率为0.3%。然而,目前的方法表明,3359个鉴定结果中约5%是错误的,还有更多结果可能存在歧义(例如,由于未考虑某些氨基酸替换和修饰)。此外,还鉴定出了677个肽段和39种蛋白质,这些是传统分析中遗漏的,包括非胰蛋白酶肽段、具有各种预期/意外化学修饰的肽段、已知/未知的翻译后修饰、单核苷酸多态性或基因编码错误,以及单个肽段的多种修饰。