Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín, Av. 25 de Mayo y Francia CP(1650), San Martín, Argentina.
Nuffield Department of Medicine, University of Oxford, Oxford, OX3 7BN, UK.
Proteomics. 2019 Feb;19(4):e1800357. doi: 10.1002/pmic.201800357. Epub 2019 Jan 18.
LC-MS/MS has become the standard platform for the characterization of immunopeptidomes, the collection of peptides naturally presented by major histocompatibility complex molecules to the cell surface. The protocols and algorithms used for immunopeptidomics data analysis are based on tools developed for traditional bottom-up proteomics that address the identification of peptides generated by tryptic digestion. Such algorithms are generally not tailored to the specific requirements of MHC ligand identification and, as a consequence, immunopeptidomics datasets suffer from dismissal of informative spectral information and high false discovery rates. Here, a new pipeline for the refinement of peptide-spectrum matches (PSM) is proposed, based on the assumption that immunopeptidomes contain a limited number of recurring peptide motifs, corresponding to MHC specificities. Sequence motifs are learned directly from the individual peptidome by training a prediction model on high-confidence PSMs. The model is then applied to PSM candidates with lower confidence, and sequences that score significantly higher than random peptides are rescued as likely true ligands. The pipeline is applied to MHC class I immunopeptidomes from three different species, and it is shown that it can increase the number of identified ligands by up to 20-30%, while effectively removing false positives and products of co-precipitation. Spectral validation using synthetic peptides confirms the identity of a large proportion of rescued ligands in the experimental peptidome.
LC-MS/MS 已成为免疫肽组学的标准平台,用于收集主要组织相容性复合物分子自然呈递到细胞表面的肽。用于免疫肽组学数据分析的协议和算法是基于为传统的基于 Bottom-up 的蛋白质组学开发的工具,这些工具用于解决由胰蛋白酶消化产生的肽的鉴定问题。此类算法通常不适用于 MHC 配体鉴定的特定要求,因此免疫肽组学数据集存在有意义的光谱信息丢失和高假阳性率的问题。在这里,提出了一种基于假设的新的肽谱匹配(PSM)细化流水线,即免疫肽组包含数量有限的重复肽基序,这些基序对应于 MHC 特异性。通过在高可信度的 PSM 上训练预测模型,直接从个体肽组中学习序列基序。然后,将该模型应用于置信度较低的 PSM 候选者,并且评分显著高于随机肽的序列将被抢救为可能的真实配体。该流水线应用于来自三个不同物种的 MHC 类 I 免疫肽组,结果表明,它可以将鉴定的配体数量增加多达 20-30%,同时有效地去除假阳性和共沉淀产物。使用合成肽进行的光谱验证证实了实验肽组中大量抢救配体的身份。