Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States.
Talus Biosciences, Seattle, Washington 98112, United States.
J Proteome Res. 2023 Nov 3;22(11):3427-3438. doi: 10.1021/acs.jproteome.3c00205. Epub 2023 Oct 20.
Quantitative measurements produced by tandem mass spectrometry proteomics experiments typically contain a large proportion of missing values. Missing values hinder reproducibility, reduce statistical power, and make it difficult to compare across samples or experiments. Although many methods exist for imputing missing values, in practice, the most commonly used methods are among the worst performing. Furthermore, previous benchmarking studies have focused on relatively simple measurements of error such as the mean-squared error between imputed and held-out values. Here we evaluate the performance of commonly used imputation methods using three practical, "downstream-centric" criteria. These criteria measure the ability to identify differentially expressed peptides, generate new quantitative peptides, and improve the peptide lower limit of quantification. Our evaluation comprises several experiment types and acquisition strategies, including data-dependent and data-independent acquisition. We find that imputation does not necessarily improve the ability to identify differentially expressed peptides but that it can identify new quantitative peptides and improve the peptide lower limit of quantification. We find that MissForest is generally the best performing method per our downstream-centric criteria. We also argue that existing imputation methods do not properly account for the variance of peptide quantifications and highlight the need for methods that do.
串联质谱蛋白质组学实验产生的定量测量值通常包含很大比例的缺失值。缺失值会阻碍可重复性,降低统计能力,并使得跨样本或实验进行比较变得困难。尽管存在许多用于插补缺失值的方法,但在实践中,最常用的方法是性能最差的方法之一。此外,以前的基准研究主要集中在相对简单的错误测量上,例如插补值和保留值之间的均方误差。在这里,我们使用三个实际的、“下游中心”标准来评估常用插补方法的性能。这些标准衡量识别差异表达肽、生成新定量肽和提高肽定量下限的能力。我们的评估包括几种实验类型和采集策略,包括数据依赖和数据独立采集。我们发现,插补不一定能提高识别差异表达肽的能力,但它可以识别新的定量肽并提高肽定量下限。我们发现,根据我们的下游中心标准,MissForest 通常是性能最好的方法。我们还认为,现有的插补方法没有正确考虑肽定量的方差,并强调需要开发能够正确考虑这种方差的方法。