Computational Biomedicine Group, Turku Centre for Biotechnology Finland.
Computational Biomedicine research group at the Turku Centre for Biotechnology Finland.
Brief Bioinform. 2018 Nov 27;19(6):1344-1355. doi: 10.1093/bib/bbx054.
Label-free mass spectrometry (MS) has developed into an important tool applied in various fields of biological and life sciences. Several software exist to process the raw MS data into quantified protein abundances, including open source and commercial solutions. Each software includes a set of unique algorithms for different tasks of the MS data processing workflow. While many of these algorithms have been compared separately, a thorough and systematic evaluation of their overall performance is missing. Moreover, systematic information is lacking about the amount of missing values produced by the different proteomics software and the capabilities of different data imputation methods to account for them.In this study, we evaluated the performance of five popular quantitative label-free proteomics software workflows using four different spike-in data sets. Our extensive testing included the number of proteins quantified and the number of missing values produced by each workflow, the accuracy of detecting differential expression and logarithmic fold change and the effect of different imputation and filtering methods on the differential expression results. We found that the Progenesis software performed consistently well in the differential expression analysis and produced few missing values. The missing values produced by the other software decreased their performance, but this difference could be mitigated using proper data filtering or imputation methods. Among the imputation methods, we found that the local least squares (lls) regression imputation consistently increased the performance of the software in the differential expression analysis, and a combination of both data filtering and local least squares imputation increased performance the most in the tested data sets.
无标记质谱(MS)已发展成为应用于生物和生命科学各个领域的重要工具。有几种软件可用于将原始 MS 数据处理为定量蛋白质丰度,包括开源和商业解决方案。每个软件都包含一组用于 MS 数据处理工作流程不同任务的独特算法。虽然已经分别比较了许多这些算法,但它们的整体性能缺乏彻底和系统的评估。此外,关于不同蛋白质组学软件产生的缺失值数量以及不同数据插补方法弥补它们的能力的系统信息也缺乏。在这项研究中,我们使用四个不同的掺入数据集评估了五种流行的定量无标记蛋白质组学软件工作流程的性能。我们的广泛测试包括每个工作流程定量的蛋白质数量和产生的缺失值数量、检测差异表达和对数倍数变化的准确性以及不同插补和过滤方法对差异表达结果的影响。我们发现 Progenesis 软件在差异表达分析中表现一致良好,产生的缺失值很少。其他软件产生的缺失值降低了它们的性能,但可以通过适当的数据过滤或插补方法来减轻这种差异。在插补方法中,我们发现局部最小二乘(lls)回归插补一致地提高了软件在差异表达分析中的性能,并且数据过滤和局部最小二乘插补的组合在测试数据集上提高了性能。