评估基于质谱的临床样本蛋白质组分析中标准化方法。

Assessing normalization methods in mass spectrometry-based proteome profiling of clinical samples.

机构信息

Nestlé Institute of Food Safety & Analytical Sciences, Nestlé Research, EPFL Innovation Park, 1015, Lausanne, Switzerland.

Nestlé Institute of Food Safety & Analytical Sciences, Nestlé Research, EPFL Innovation Park, 1015, Lausanne, Switzerland; Chemistry and Chemical Engineering Section, School of Basic Sciences, Ecole Polytechnique Fédérale de Lausanne, 1015, Lausanne, Switzerland.

出版信息

Biosystems. 2022 Jun;215-216:104661. doi: 10.1016/j.biosystems.2022.104661. Epub 2022 Mar 2.

DOI:10.1016/j.biosystems.2022.104661

PMID:35247480

Abstract

BACKGROUND

Large-scale proteomic studies have to deal with unwanted variability, especially when samples originate from different centers and multiple analytical batches are needed. Such variability is typically added throughout all the steps of a clinical research study, from human biological sample collection and storage, sample preparation, spectral data acquisition, to peptide and protein quantification. In order to remove such diverse and unwanted variability, normalization of the protein data is performed. There have been already several published reviews comparing normalization methods in the -omics field, but reports focusing on proteomic data generated with mass spectrometry (MS) are much fewer. Additionally, most of these reports have only dealt with small datasets.

RESULTS

As a case study, here we focused on the normalization of a large MS-based proteomic dataset obtained from an overweight and obese pan-European cohort, where different normalization methods were evaluated, namely: center standardize, quantile protein, quantile sample, global standardization, ComBat, median centering, mean centering, single standard and removal of unwanted variation (RUV); some of these are generic normalization methods while others have been specifically created to deal with genomic or metabolomic data. We checked how relationships between proteins and clinical variables (e.g., gender, levels of triglycerides or cholesterol) were improved after normalizing the data with the different methods.

CONCLUSIONS

Some normalization methods were better adapted for this particular large-scale shotgun proteomic dataset of human plasma samples labeled with isobaric tags and analyzed with liquid chromatography-tandem MS. In particular, quantile sample normalization, RUV, mean and median centering showed very good performances, while quantile protein normalization provided worse results than those obtained with unnormalized data.

摘要

背景

大规模蛋白质组学研究必须处理不必要的变异性，特别是当样品来自不同的中心且需要多个分析批次时。这种变异性通常会在临床研究的所有步骤中增加，从人类生物样本的收集和储存、样本制备、光谱数据采集到肽和蛋白质定量。为了去除这种多样化和不必要的变异性，对蛋白质数据进行了标准化处理。已经有几篇关于在组学领域比较标准化方法的综述，但针对基于质谱（MS）的蛋白质组学数据的报告要少得多。此外，这些报告中的大多数只处理了小数据集。

结果

作为一个案例研究，我们在这里专注于对一个来自超重和肥胖的泛欧队列的基于 MS 的大型蛋白质组学数据集进行标准化，评估了不同的标准化方法，即：中心标准化、分位数蛋白、分位数样本、全局标准化、ComBat、中位数中心化、均值中心化、单一标准和去除不必要的变异（RUV）；其中一些是通用的标准化方法，而另一些则是专门为处理基因组或代谢组数据而创建的。我们检查了在使用不同方法对数据进行标准化后，蛋白质与临床变量（如性别、甘油三酯或胆固醇水平）之间的关系如何得到改善。