Arend Lis, Adamowicz Klaudia, Schmidt Johannes R, Burankova Yuliya, Zolotareva Olga, Tsoy Olga, Pauling Josch K, Kalkhof Stefan, Baumbach Jan, List Markus, Laske Tanja
Data Science in Systems Biology, TUM School of Life Sciences, Technical University of Munich, Maximus-von-Imhof Forum 3, 85354 Freising, Germany.
Institute for Computational Systems Biology, University of Hamburg, Albert-Einstein-Ring 8-10, 22761 Hamburg, Germany.
Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf201.
Despite the significant progress in accuracy and reliability in mass spectrometry technology, as well as the development of strategies based on isotopic labeling or internal standards in recent decades, systematic biases originating from non-biological factors remain a significant challenge in data analysis. In addition, the wide range of available normalization methods renders the choice of a suitable normalization method challenging. We systematically evaluated 17 normalization and 2 batch effect correction methods, originally developed for preprocessing DNA microarray data but widely applied in proteomics, on 6 publicly available spike-in and 3 label-free and tandem mass tag datasets. Opposed to state-of-the-art normalization practice, we found that a reduction in intragroup variation is not directly related to the effectiveness of the normalization methods. Furthermore, our results demonstrated that the methods RobNorm and Normics, specifically developed for proteomics data, in line with LoessF performed consistently well across the spike-in datasets, while EigenMS exhibited a high false-positive rate. Finally, based on experimental data, we show that normalization substantially impacts downstream analyses, and the impact is highly dataset-specific, emphasizing the importance of use-case-specific evaluations for novel proteomics datasets. For this, we developed the PROteomics Normalization Evaluator (PRONE), a unifying R package enabling comparative evaluation of normalization methods, including their impact on downstream analyses, while offering considerable flexibility, acknowledging the lack of universally accepted standards. PRONE is available on Bioconductor with a web application accessible at https://exbio.wzw.tum.de/prone/.
尽管近几十年来质谱技术在准确性和可靠性方面取得了重大进展,以及基于同位素标记或内标的策略也有所发展,但非生物因素导致的系统偏差在数据分析中仍然是一个重大挑战。此外,大量可用的归一化方法使得选择合适的归一化方法具有挑战性。我们系统地评估了最初为预处理DNA微阵列数据而开发但广泛应用于蛋白质组学的17种归一化方法和2种批次效应校正方法,这些方法应用于6个公开可用的掺入标准品数据集以及3个无标记和串联质谱标签数据集。与当前最先进的归一化实践相反,我们发现组内变异的减少与归一化方法的有效性没有直接关系。此外,我们的结果表明,专门为蛋白质组学数据开发的RobNorm和Normics方法,与LoessF方法一致,在掺入标准品数据集中表现始终良好,而EigenMS显示出较高的假阳性率。最后,基于实验数据,我们表明归一化对下游分析有重大影响,并且这种影响高度依赖于数据集,强调了针对新蛋白质组学数据集进行特定用例评估的重要性。为此,我们开发了蛋白质组学归一化评估器(PRONE),这是一个统一的R包,能够对归一化方法进行比较评估,包括它们对下游分析的影响,同时提供了相当大的灵活性,承认缺乏普遍接受的标准。PRONE可在Bioconductor上获取,其网络应用程序可通过https://exbio.wzw.tum.de/prone/访问。