Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, 79104, Freiburg, Germany.
Spemann Graduate School of Biology and Medicine (SGBM), University of Freiburg, 79104, Freiburg, Germany.
Proteomics. 2020 Dec;20(24):e2000068. doi: 10.1002/pmic.202000068. Epub 2020 Oct 7.
High-throughput biological data-such as mass spectrometry (MS)-based proteomics data-suffer from systematic non-biological variance due to systematic errors. This hinders the estimation of "real" biological signals and, in turn, decreases the power of statistical tests and biases the identification of differentially expressed proteins. To remove such unintended variation, while retaining the biological signal of interest, analysis workflows for quantitative MS data typically comprise normalization prior to their statistical analysis. Several normalization methods, such as quantile normalization (QN), have originally been developed for microarray data. In contrast to microarray data proteomics data may contain features, in the form of protein intensities that are consistently high across experimental conditions and, hence, are encountered in the tails of the protein intensity distribution. If QN is applied in the presence of such proteins statistical inferences of the features' intensity profiles are impeded due to the biased estimation of their variance. A freely available, novel approach is introduced which serves as an improvement of the classical QN by preserving the biological signals of features in the tails of the intensity distribution and by accounting for sample-dependent missing values (MVs): The "tail-robust quantile normalization" (TRQN).
高通量生物数据,如基于质谱(MS)的蛋白质组学数据,由于系统误差而受到系统性的非生物学变异的影响。这阻碍了“真实”生物信号的估计,进而降低了统计检验的功效,并偏向了差异表达蛋白的鉴定。为了去除这种非预期的变化,同时保留感兴趣的生物学信号,定量 MS 数据的分析工作流程通常在进行统计分析之前进行归一化。几种归一化方法,如分位数归一化(QN),最初是为微阵列数据开发的。与微阵列数据相比,蛋白质组学数据可能包含以蛋白质强度形式出现的特征,这些特征在实验条件下始终保持较高水平,因此出现在蛋白质强度分布的尾部。如果在存在这种蛋白质的情况下应用 QN,则由于其方差的有偏估计,对特征强度分布尾部的统计推断会受到阻碍。本文介绍了一种免费的新方法,它通过保留强度分布尾部的特征的生物学信号,并考虑样本相关的缺失值(MV),对经典 QN 进行了改进:“尾部稳健分位数归一化”(TRQN)。