V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, 38 Leninsky Pr., Bld. 2, Moscow 119334, Russia.
Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Denmark.
J Proteome Res. 2021 Apr 2;20(4):1864-1873. doi: 10.1021/acs.jproteome.0c00863. Epub 2021 Mar 15.
Proteome-wide analyses rely on tandem mass spectrometry and the extensive separation of proteolytic mixtures. This imposes considerable instrumental time consumption, which is one of the main obstacles in the broader acceptance of proteomics in biomedical and clinical research. Recently, we presented a fast proteomic method termed DirectMS1 based on ultrashort LC gradients as well as MS1-only mass spectra acquisition and data processing. The method allows significant reduction of the proteome-wide analysis time to a few minutes at the depth of quantitative proteome coverage of 1000 proteins at 1% false discovery rate (FDR). In this work, to further increase the capabilities of the DirectMS1 method, we explored the opportunities presented by the recent progress in the machine-learning area and applied the LightGBM decision tree boosting algorithm to the scoring of peptide feature matches when processing MS1 spectra. Furthermore, we integrated the peptide feature identification algorithm of DirectMS1 with the recently introduced peptide retention time prediction utility, DeepLC. Additional approaches to improve the performance of the DirectMS1 method are discussed and demonstrated, such as using FAIMS for gas-phase ion separation. As a result of all improvements to DirectMS1, we succeeded in identifying more than 2000 proteins at 1% FDR from the HeLa cell line in a 5 min gradient LC-FAIMS/MS1 analysis. The data sets generated and analyzed during the current study have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the data set identifier PXD023977.
蛋白质组学分析依赖于串联质谱和蛋白质水解混合物的广泛分离。这需要相当大的仪器时间消耗,这是蛋白质组学在生物医学和临床研究中更广泛接受的主要障碍之一。最近,我们提出了一种快速蛋白质组学方法,称为 DirectMS1,该方法基于超短 LC 梯度以及仅采集 MS1 质谱和数据处理。该方法允许将蛋白质组学的分析时间从几分钟缩短到几分钟,在 1%的假发现率 (FDR) 下达到 1000 种蛋白质的定量蛋白质组覆盖深度。在这项工作中,为了进一步提高 DirectMS1 方法的能力,我们探索了机器学习领域的最新进展所带来的机会,并在处理 MS1 光谱时将 LightGBM 决策树增强算法应用于肽特征匹配的评分。此外,我们将 DirectMS1 的肽特征识别算法与最近引入的肽保留时间预测实用程序 DeepLC 集成。讨论并展示了改进 DirectMS1 方法的其他方法,例如使用 FAIMS 进行气相离子分离。作为对 DirectMS1 的所有改进的结果,我们成功地在 5 分钟梯度 LC-FAIMS/MS1 分析中从 HeLa 细胞系中鉴定出超过 2000 种蛋白质,达到 1% FDR。在当前研究中生成和分析的数据集已通过 PRIDE 合作伙伴存储库通过 ProteomeXchange 联盟提交,数据集标识符为 PXD023977。