Wang Jian, Anania Veronica G, Knott Jeff, Rush John, Lill Jennie R, Bourne Philip E, Bandeira Nuno
Bioinformatics Program, University of California, San Diego, La Jolla, California;
Mol Cell Proteomics. 2014 Apr;13(4):1128-36. doi: 10.1074/mcp.M113.035758. Epub 2014 Feb 3.
The combination of chemical cross-linking and mass spectrometry has recently been shown to constitute a powerful tool for studying protein-protein interactions and elucidating the structure of large protein complexes. However, computational methods for interpreting the complex MS/MS spectra from linked peptides are still in their infancy, making the high-throughput application of this approach largely impractical. Because of the lack of large annotated datasets, most current approaches do not capture the specific fragmentation patterns of linked peptides and therefore are not optimal for the identification of cross-linked peptides. Here we propose a generic approach to address this problem and demonstrate it using disulfide-bridged peptide libraries to (i) efficiently generate large mass spectral reference data for linked peptides at a low cost and (ii) automatically train an algorithm that can efficiently and accurately identify linked peptides from MS/MS spectra. We show that using this approach we were able to identify thousands of MS/MS spectra from disulfide-bridged peptides through comparison with proteome-scale sequence databases and significantly improve the sensitivity of cross-linked peptide identification. This allowed us to identify 60% more direct pairwise interactions between the protein subunits in the 20S proteasome complex than existing tools on cross-linking studies of the proteasome complexes. The basic framework of this approach and the MS/MS reference dataset generated should be valuable resources for the future development of new tools for the identification of linked peptides.
化学交联与质谱联用最近已被证明是研究蛋白质-蛋白质相互作用以及阐明大型蛋白质复合物结构的强大工具。然而,用于解释来自交联肽段的复杂串联质谱(MS/MS)图谱的计算方法仍处于起步阶段,这使得该方法的高通量应用在很大程度上不切实际。由于缺乏大量带注释的数据集,大多数当前方法无法捕捉交联肽段的特定碎裂模式,因此对于交联肽段的鉴定并非最优。在此,我们提出一种通用方法来解决这一问题,并使用二硫键连接的肽库进行了验证,以(i)低成本高效生成交联肽段的大量质谱参考数据,以及(ii)自动训练一种算法,该算法能够从MS/MS图谱中高效且准确地鉴定交联肽段。我们表明,使用这种方法,通过与蛋白质组规模的序列数据库进行比较,我们能够鉴定出数千个来自二硫键连接肽段的MS/MS图谱,并显著提高交联肽段鉴定的灵敏度。这使我们在20S蛋白酶体复合物的交联研究中,与现有工具相比,能够多鉴定出60%的蛋白质亚基之间的直接成对相互作用。该方法的基本框架以及所生成的MS/MS参考数据集应为未来开发用于鉴定交联肽段的新工具提供有价值的资源。