Advanced Centre for Treatment, Research and Education in Cancer, Tata Memorial Centre, Kharghar, Navi Mumbai, 410210, India.
BARC Training School Complex, Homi Bhabha National Institute, Anushakti Nagar, Mumbai, 400094, India.
J Transl Med. 2019 May 31;17(1):184. doi: 10.1186/s12967-019-1937-9.
SWATH-MS has emerged as the strategy of choice for biomarker discovery due to the proteome coverage achieved in acquisition and provision to re-interrogate the data. However, in quantitative analysis using SWATH, each sample from the comparison group is run individually in mass spectrometer and the resulting inter-run variation may influence relative quantification and identification of biomarkers. Normalization of data to diminish this variation thereby becomes an essential step in SWATH data processing. In most reported studies, data normalization methods used are those provided in instrument-based data analysis software or those used for microarray data. This study, for the first time provides an experimental evidence for selection of normalization method optimal for biomarker identification.
The efficiency of 12 normalization methods to normalize SWATH-MS data was evaluated based on statistical criteria in 'Normalyzer'-a tool which provides comparative evaluation of normalization by different methods. Further, the suitability of normalized data for biomarker discovery was assessed by evaluating the clustering efficiency of differentiators, identified from the normalized data based on p-value, fold change and both, by hierarchical clustering in Genesis software v.1.8.1.
Conventional statistical criteria identified VSN-G as the optimal method for normalization of SWATH data. However, differentiators identified from VSN-G normalized data failed to segregate test and control groups. We thus assessed data normalized by eleven other methods for their ability to yield differentiators which segregate the study groups. Datasets in our study demonstrated that differentiators identified based on p-value from data normalized with Loess-R stratified the study groups optimally.
This is the first report of experimentally tested strategy for SWATH-MS data processing with an emphasis on identification of clinically relevant biomarkers. Normalization of SWATH-MS data by Loess-R method and identification of differentiators based on p-value were found to be optimal for biomarker discovery in this study. The study also demonstrates the need to base the choice of normalization method on the application of the data.
SWATH-MS 技术由于在获取和重新分析数据时实现了蛋白质组覆盖,因此成为生物标志物发现的首选策略。然而,在使用 SWATH 进行定量分析时,比较组中的每个样本都在质谱仪中单独运行,由此产生的运行间差异可能会影响相对定量和生物标志物的鉴定。因此,数据标准化以减少这种差异成为 SWATH 数据处理的重要步骤。在大多数报道的研究中,使用的是基于仪器数据分析软件提供的或用于微阵列数据的标准化方法。本研究首次提供了选择最佳生物标志物鉴定的标准化方法的实验证据。
基于“Normalyzer”工具中的统计标准评估了 12 种标准化方法对 SWATH-MS 数据的效率,该工具提供了不同方法的比较评估。此外,通过在 Genesis 软件 v1.8.1 中基于 p 值、倍数变化和两者对基于归一化数据鉴定的差异标志物进行层次聚类,评估了归一化数据用于生物标志物发现的适用性。
传统的统计标准确定 VSN-G 是 SWATH 数据标准化的最佳方法。然而,基于 VSN-G 归一化数据鉴定的差异标志物未能区分测试和对照两组。因此,我们评估了其他 11 种方法归一化数据的能力,以产生能够区分研究组的差异标志物。我们研究中的数据集表明,基于 p 值从 Loess-R 分层归一化数据中鉴定的差异标志物最佳地分层了研究组。
这是首次报告的经过实验验证的 SWATH-MS 数据处理策略,重点是鉴定具有临床相关性的生物标志物。在本研究中,Loess-R 方法的 SWATH-MS 数据归一化和基于 p 值的差异标志物鉴定被发现是生物标志物发现的最佳方法。该研究还表明,需要根据数据的应用来选择标准化方法。