Chen Yue, Zhang Junmei, Xing Gang, Zhao Yingming
Department of Biochemistry, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas 75390-9038, USA.
J Proteome Res. 2009 Jun;8(6):3141-7. doi: 10.1021/pr900172v.
False positives that arise when MS/MS data are used to search protein sequence databases remain a concern in proteomics research. Here, we present five types of false positives identified when aligning sequences to MS/MS spectra by Mascot database searching software. False positives arise because of (1) enzymatic digestion at abnormal sites; (2) misinterpretation of charge states; (3) misinterpretation of protein modifications; (4) incorrect assignment of the protein modification site; and (5) incorrect use of isotopic peaks. We present examples, clearly identified as false positives by manual inspection, that nevertheless were assigned high scores by Mascot sequence alignment algorithm. In some examples, the sequence assigned to the MS/MS spectrum explains more than 80% of the fragment ions present. Because of high sequence similarity between the false positives and their corresponding true hits, the false positive rate cannot be evaluated by the common method of using a reversed or scrambled sequence database. A common feature of the false positives is the presence of unmatched peaks in the MS/MS spectra. Our studies highlight the importance of using unmatched peaks to remove false positives and offer direction to aid development of better sequence alignment algorithms for peptide and PTM identification.
在蛋白质组学研究中,当使用串联质谱(MS/MS)数据搜索蛋白质序列数据库时出现的假阳性结果仍然是一个令人担忧的问题。在此,我们展示了通过Mascot数据库搜索软件将序列与MS/MS谱图比对时识别出的五种假阳性类型。假阳性的出现是由于:(1)在异常位点的酶切;(2)电荷状态的错误解读;(3)蛋白质修饰的错误解读;(4)蛋白质修饰位点的错误分配;以及(5)同位素峰的错误使用。我们展示了一些通过人工检查明确鉴定为假阳性的例子,然而这些例子却被Mascot序列比对算法赋予了高分。在一些例子中,分配给MS/MS谱图的序列解释了超过80%的存在的碎片离子。由于假阳性与其相应的真实匹配之间具有高度的序列相似性,因此无法通过使用反向或随机序列数据库的常用方法来评估假阳性率。假阳性的一个共同特征是在MS/MS谱图中存在未匹配的峰。我们的研究强调了使用未匹配的峰来去除假阳性的重要性,并为开发更好的用于肽段和翻译后修饰(PTM)鉴定的序列比对算法提供了指导方向。