Benedetti Elisa, Gerstner Nathalie, Pučić-Baković Maja, Keser Toma, Reiding Karli R, Ruhaak L Renee, Štambuk Tamara, Selman Maurice H J, Rudan Igor, Polašek Ozren, Hayward Caroline, Beekman Marian, Slagboom Eline, Wuhrer Manfred, Dunlop Malcolm G, Lauc Gordan, Krumsiek Jan
Department of Physiology and Biophysics, Institute for Computational Biomedicine, Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY 10022, USA.
Institute of Computational Biology, Helmholtz Zentrum München-German Research Center for Environmental Health, 85764 Neuherberg, Germany.
Metabolites. 2020 Jul 2;10(7):271. doi: 10.3390/metabo10070271.
Glycomics measurements, like all other high-throughput technologies, are subject to technical variation due to fluctuations in the experimental conditions. The removal of this non-biological signal from the data is referred to as normalization. Contrary to other omics data types, a systematic evaluation of normalization options for glycomics data has not been published so far. In this paper, we assess the quality of different normalization strategies for glycomics data with an innovative approach. It has been shown previously that Gaussian Graphical Models (GGMs) inferred from glycomics data are able to identify enzymatic steps in the glycan synthesis pathways in a data-driven fashion. Based on this finding, here, we quantify the quality of a given normalization method according to how well a GGM inferred from the respective normalized data reconstructs known synthesis reactions in the glycosylation pathway. The method therefore exploits a biological measure of goodness. We analyzed 23 different normalization combinations applied to six large-scale glycomics cohorts across three experimental platforms: Liquid Chromatography - ElectroSpray Ionization - Mass Spectrometry (LC-ESI-MS), Ultra High Performance Liquid Chromatography with Fluorescence Detection (UHPLC-FLD), and Matrix Assisted Laser Desorption Ionization - Furier Transform Ion Cyclotron Resonance - Mass Spectrometry (MALDI-FTICR-MS). Based on our results, we recommend normalizing glycan data using the 'Probabilistic Quotient' method followed by log-transformation, irrespective of the measurement platform. This recommendation is further supported by an additional analysis, where we ranked normalization methods based on their statistical associations with age, a factor known to associate with glycomics measurements.
与所有其他高通量技术一样,糖组学测量也会因实验条件的波动而受到技术变异的影响。从数据中去除这种非生物信号被称为归一化。与其他组学数据类型不同,目前尚未发表关于糖组学数据归一化选项的系统评估。在本文中,我们采用一种创新方法评估了糖组学数据不同归一化策略的质量。此前已经表明,从糖组学数据推断出的高斯图形模型(GGMs)能够以数据驱动的方式识别聚糖合成途径中的酶促步骤。基于这一发现,在此我们根据从各自归一化数据推断出的GGM对糖基化途径中已知合成反应的重建程度来量化给定归一化方法的质量。因此,该方法利用了一种生物学上的优劣度量。我们分析了应用于三个实验平台上六个大规模糖组学队列的23种不同归一化组合:液相色谱 - 电喷雾电离 - 质谱(LC - ESI - MS)、带荧光检测的超高效液相色谱(UHPLC - FLD)以及基质辅助激光解吸电离 - 傅里叶变换离子回旋共振 - 质谱(MALDI - FTICR - MS)。根据我们的结果,我们建议无论测量平台如何,都使用“概率商”方法对聚糖数据进行归一化,然后进行对数变换。另一项分析进一步支持了这一建议,在该分析中,我们根据归一化方法与年龄(已知与糖组学测量相关的一个因素)的统计关联对归一化方法进行了排名。