Ji Chao, Li Sujun, Reilly James P, Radivojac Predrag, Tang Haixu
Department of Computer Science and Informatics and ‡Department of Chemistry, Indiana University , Bloomington, Indiana 47405, United States.
J Proteome Res. 2016 Jun 3;15(6):1830-41. doi: 10.1021/acs.jproteome.6b00004. Epub 2016 May 6.
Chemical cross-linking combined with mass spectrometric analysis has become an important technique for probing protein three-dimensional structure and protein-protein interactions. A key step in this process is the accurate identification and validation of cross-linked peptides from tandem mass spectra. The identification of cross-linked peptides, however, presents challenges related to the expanded nature of the search space (all pairs of peptides in a sequence database) and the fact that some peptide-spectrum matches (PSMs) contain one correct and one incorrect peptide but often receive scores that are comparable to those in which both peptides are correctly identified. To address these problems and improve detection of cross-linked peptides, we propose a new database search algorithm, XLSearch, for identifying cross-linked peptides. Our approach is based on a data-driven scoring scheme that independently estimates the probability of correctly identifying each individual peptide in the cross-link given knowledge of the correct or incorrect identification of the other peptide. These conditional probabilities are subsequently used to estimate the joint posterior probability that both peptides are correctly identified. Using the data from two previous cross-link studies, we show the effectiveness of this scoring scheme, particularly in distinguishing between true identifications and those containing one incorrect peptide. We also provide evidence that XLSearch achieves more identifications than two alternative methods at the same false discovery rate (availability: https://github.com/COL-IU/XLSearch ).
化学交联结合质谱分析已成为探究蛋白质三维结构和蛋白质-蛋白质相互作用的一项重要技术。这一过程中的关键步骤是从串联质谱中准确鉴定和验证交联肽段。然而,交联肽段的鉴定面临着与搜索空间扩大(序列数据库中的所有肽段对)相关的挑战,以及一些肽段-谱匹配(PSM)包含一个正确肽段和一个错误肽段但得分往往与两个肽段均被正确鉴定的情况相当这一事实。为了解决这些问题并改进交联肽段的检测,我们提出了一种用于鉴定交联肽段的新数据库搜索算法XLSearch。我们的方法基于一种数据驱动的评分方案,该方案在已知另一个肽段正确或错误鉴定的情况下,独立估计交联中每个单独肽段正确鉴定的概率。这些条件概率随后用于估计两个肽段均被正确鉴定的联合后验概率。利用之前两项交联研究的数据,我们展示了这种评分方案的有效性,特别是在区分真实鉴定和包含一个错误肽段的鉴定方面。我们还提供证据表明,在相同的错误发现率下,XLSearch比另外两种方法鉴定出更多的肽段(获取方式:https://github.com/COL-IU/XLSearch )。