School of Information Technologies, University of Sydney, NSW 2006, Australia.
J Proteome Res. 2012 May 4;11(5):3035-45. doi: 10.1021/pr300072j. Epub 2012 Mar 30.
A key step in the analysis of mass spectrometry (MS)-based proteomics data is the inference of proteins from identified peptide sequences. Here we describe Re-Fraction, a novel machine learning algorithm that enhances deterministic protein identification. Re-Fraction utilizes several protein physical properties to assign proteins to expected protein fractions that comprise large-scale MS-based proteomics data. This information is then used to appropriately assign peptides to specific proteins. This approach is sensitive, highly specific, and computationally efficient. We provide algorithms and source code for the current version of Re-Fraction, which accepts output tables from the MaxQuant environment. Nevertheless, the principles behind Re-Fraction can be applied to other protein identification pipelines where data are generated from samples fractionated at the protein level. We demonstrate the utility of this approach through reanalysis of data from a previously published study and generate lists of proteins deterministically identified by Re-Fraction that were previously only identified as members of a protein group. We find that this approach is particularly useful in resolving protein groups composed of splice variants and homologues, which are frequently expressed in a cell- or tissue-specific manner and may have important biological consequences.
质谱(MS)为基础的蛋白质组学数据分析的一个关键步骤是从鉴定的肽序列推断蛋白质。在这里,我们描述了 Re-Fraction,这是一种新的机器学习算法,可以增强确定性蛋白质鉴定。Re-Fraction 利用几种蛋白质物理性质将蛋白质分配到预期的蛋白质分数中,这些分数包含大规模的 MS 为基础的蛋白质组学数据。然后,该信息用于将肽适当地分配到特定的蛋白质上。该方法具有较高的灵敏度、特异性和计算效率。我们提供了当前版本 Re-Fraction 的算法和源代码,该版本接受来自 MaxQuant 环境的输出表。然而,Re-Fraction 的原理可以应用于其他蛋白质鉴定管道,其中数据是从蛋白质水平分馏的样本中生成的。我们通过重新分析先前发表的研究中的数据来证明该方法的实用性,并生成了由 Re-Fraction 确定性鉴定的蛋白质列表,这些蛋白质之前仅被鉴定为蛋白质组的成员。我们发现,这种方法在解决由剪接变体和同源物组成的蛋白质组特别有用,这些变体和同源物通常以细胞或组织特异性的方式表达,并且可能具有重要的生物学后果。