Trnka Michael J, Baker Peter R, Robinson Philip J J, Burlingame A L, Chalkley Robert J
Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, California 94158;
Mol Cell Proteomics. 2014 Feb;13(2):420-34. doi: 10.1074/mcp.M113.034009. Epub 2013 Dec 12.
Chemical cross-linking mass spectrometry identifies interacting surfaces within a protein assembly through labeling with bifunctional reagents and identifying the covalently modified peptides. These yield distance constraints that provide a powerful means to model the three-dimensional structure of the assembly. Bioinformatic analysis of cross-linked data resulting from large protein assemblies is challenging because each cross-linked product contains two covalently linked peptides, each of which must be correctly identified from a complex matrix of potential confounders. Protein Prospector addresses these issues through a complementary mass modification strategy in which each peptide is searched and identified separately. We demonstrate this strategy with an analysis of RNA polymerase II. False discovery rates (FDRs) are assessed via comparison of cross-linking data to crystal structure, as well as by using a decoy database strategy. Parameters that are most useful for positive identification of cross-linked spectra are explored. We find that fragmentation spectra generally contain more product ions from one of the two peptides constituting the cross-link. Hence, metrics reflecting the quality of the spectral match to the less confident peptide provide the most discriminatory power between correct and incorrect matches. A support vector machine model was built to further improve classification of cross-linked peptide hits. Furthermore, the frequency with which peptides cross-linked via common acylating reagents fragment to produce diagnostic, cross-linker-specific ions is assessed. The threshold for successful identification of the cross-linked peptide product depends upon the complexity of the sample under investigation. Protein Prospector, by focusing the reliability assessment on the least confident peptide, is better able to control the FDR for results as larger complexes and databases are analyzed. In addition, when FDR thresholds are calculated separately for intraprotein and interprotein results, a further improvement in the number of unique cross-links confidently identified is achieved. These improvements are demonstrated on two previously published cross-linking datasets.
化学交联质谱法通过用双功能试剂标记并鉴定共价修饰的肽段来识别蛋白质组装体中的相互作用表面。这些方法产生距离限制,为构建组装体的三维结构提供了有力手段。对大型蛋白质组装体产生的交联数据进行生物信息学分析具有挑战性,因为每个交联产物都包含两个共价连接的肽段,每个肽段都必须从潜在混杂因素的复杂矩阵中正确识别出来。Protein Prospector通过一种互补的质量修饰策略解决了这些问题,即分别搜索和鉴定每个肽段。我们通过对RNA聚合酶II的分析来展示这种策略。通过将交联数据与晶体结构进行比较,以及使用诱饵数据库策略来评估错误发现率(FDR)。探索了对交联光谱进行阳性鉴定最有用的参数。我们发现,碎裂光谱通常包含来自构成交联的两个肽段中一个肽段的更多产物离子。因此,反映与信心较低的肽段光谱匹配质量的指标在正确匹配和错误匹配之间提供了最大的区分能力。构建了一个支持向量机模型以进一步改进对交联肽命中结果的分类。此外,评估了通过常见酰化试剂交联的肽段产生诊断性、交联剂特异性离子的碎裂频率。成功鉴定交联肽产物的阈值取决于所研究样品的复杂性。通过将可靠性评估集中在信心最低的肽段上,Protein Prospector在分析更大的复合物和数据库时能够更好地控制结果的FDR。此外,当分别计算蛋白质内和蛋白质间结果的FDR阈值时,在可靠鉴定的独特交联数量上实现了进一步的改进。这些改进在两个先前发表的交联数据集上得到了证明。