Department of Proteomics, Merck & Co., Inc., 126 E. Lincoln Avenue, P.O. Box 2000, Rahway, NJ 07065, USA.
Rapid Commun Mass Spectrom. 2011 Dec 15;25(23):3617-26. doi: 10.1002/rcm.5257.
Mass spectrometry based proteomic experiments have advanced considerably over the past decade with high-resolution and mass accuracy tandem mass spectrometry (MS/MS) capabilities now allowing routine interrogation of large peptides and proteins. Often a major bottleneck to 'top-down' proteomics, however, is the ability to identify and characterize the complex peptides or proteins based on the acquired high-resolution MS/MS spectra. For biological samples containing proteins with multiple unpredicted processing events, unsupervised identifications can be particularly challenging. Described here is a newly created search algorithm (MAR) designed for the identification of experimentally detected peptides or proteins. This algorithm relies only on predefined list of 'differential' modifications (e.g. phosphorylation) and a FASTA-formatted protein database, and is not constrained to full-length proteins for identification. The algorithm is further powered by the ability to leverage identified mass differences between chromatographically separated ions within full-scan MS spectra to automatically generate a list of likely 'differential' modifications to be searched. The utility of the algorithm is demonstrated with the identification of 54 unique polypeptides from human apolipoprotein enriched from the high-density lipoprotein particle (HDL), and searching time benchmarks demonstrate scalability (12 high-resolution MS/MS scans searched per minute with modifications considered). This parallelizable algorithm provides an additional solution for converting high-quality MS/MS data of multiply processed proteins into reliable identifications.
基于质谱的蛋白质组学实验在过去十年中取得了相当大的进展,高分辨率和质量精度串联质谱(MS/MS)技术现在可以常规地分析大型肽和蛋白质。然而,对于“自上而下”的蛋白质组学来说,一个主要的瓶颈通常是基于获得的高分辨率 MS/MS 谱来识别和表征复杂的肽或蛋白质的能力。对于含有多个未预测加工事件的蛋白质的生物样品,无监督识别可能特别具有挑战性。这里描述的是一种新创建的搜索算法(MAR),用于识别实验检测到的肽或蛋白质。该算法仅依赖于预定义的“差异”修饰列表(例如磷酸化)和 FASTA 格式的蛋白质数据库,并且不受用于识别的全长蛋白质的限制。该算法还可以利用全扫描 MS 谱中色谱分离离子之间鉴定的质量差异,自动生成要搜索的可能“差异”修饰列表,从而进一步增强其功能。该算法的实用性通过从高密度脂蛋白(HDL)中富含人载脂蛋白的物质中鉴定出 54 个独特多肽来证明,搜索时间基准测试证明了其可扩展性(每分钟可搜索 12 个高分辨率 MS/MS 扫描,考虑到修饰)。这种可并行化的算法为将经过多次处理的蛋白质的高质量 MS/MS 数据转换为可靠的鉴定提供了另一种解决方案。