Department of Chemistry, Michigan State University, 578 South Shaw Lane, East Lansing, Michigan 48824, United States.
Anal Chem. 2020 Mar 3;92(5):3503-3507. doi: 10.1021/acs.analchem.9b05578. Epub 2020 Feb 17.
Large-scale top-down proteomics characterizes proteoforms in cells globally with high confidence and high throughput using reversed-phase liquid chromatography (RPLC)-tandem mass spectrometry (MS/MS) or capillary zone electrophoresis (CZE)-MS/MS. The false discovery rate (FDR) from the target-decoy database search is typically deployed to filter identified proteoforms to ensure high-confidence identifications (IDs). It has been demonstrated that the FDRs in top-down proteomics can be drastically underestimated. An alternative approach to the FDR can be useful for further evaluating the confidence of proteoform IDs after the database search. We argue that predicting retention/migration time of proteoforms from the RPLC/CZE separation accurately and comparing their predicted and experimental separation time could be a useful and practical approach. Based on our knowledge, there is still no report in the literature about predicting separation time of proteoforms using large top-down proteomics data sets. In this pilot study, for the first time, we evaluated various semiempirical models for predicting proteoforms' electrophoretic mobility (μ) using large-scale top-down proteomics data sets from CZE-MS/MS. We achieved a linear correlation between experimental and predicted μ of proteoforms ( = 0.98) with a simple semiempirical model, which utilizes the number of charges and molecular mass of each proteoform as the parameters. Our modeling data suggest that the complete unfolding of proteoforms during CZE separation benefits the prediction of their μ. Our results also indicate that N-terminal acetylation and phosphorylation both decrease the proteoforms' charge by roughly one charge unit.
基于反相液相色谱(RPLC)-串联质谱(MS/MS)或毛细管区带电泳(CZE)-MS/MS 技术的大规模自上而下蛋白质组学可以高置信度和高通量地全局表征细胞中的蛋白质形式。通常使用目标-诱饵数据库搜索的假发现率(FDR)来筛选鉴定的蛋白质形式,以确保高置信度的鉴定(IDs)。已经证明,自上而下蛋白质组学中的 FDR 可能会被大大低估。替代 FDR 的方法对于数据库搜索后进一步评估蛋白质形式 ID 的置信度可能是有用的。我们认为,准确预测 RPLC/CZE 分离中蛋白质形式的保留/迁移时间,并比较其预测和实验分离时间,可以是一种有用且实用的方法。据我们所知,目前还没有文献报道使用大规模自上而下蛋白质组学数据集来预测蛋白质形式的分离时间。在这项初步研究中,我们首次使用 CZE-MS/MS 从大规模自上而下蛋白质组学数据集中评估了各种半经验模型来预测蛋白质形式的电泳迁移率(μ)。我们实现了实验和预测μ之间的线性相关性,其中蛋白质形式的 = 0.98,这是一个简单的半经验模型,它利用每个蛋白质形式的电荷数和分子量作为参数。我们的建模数据表明,在 CZE 分离过程中蛋白质形式的完全展开有利于预测其μ。我们的结果还表明,N 端乙酰化和磷酸化都会使蛋白质形式的电荷减少约一个电荷单位。