Department of Epidemiology and Biostatistics, School of Public Health, Harbin Medical University, Harbin 150086, China.
Interdisciplinary Research Center on Biology and Chemistry, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai, 200032, China.
Anal Chem. 2020 Apr 7;92(7):5082-5090. doi: 10.1021/acs.analchem.9b05460. Epub 2020 Mar 24.
Untargeted metabolomics based on liquid chromatography-mass spectrometry is affected by nonlinear batch effects, which cover up biological effects, result in nonreproducibility, and are difficult to be calibrate. In this study, we propose a novel deep learning model, called Normalization Autoencoder (NormAE), which is based on nonlinear autoencoders (AEs) and adversarial learning. An additional classifier and ranker are trained to provide adversarial regularization during the training of the AE model, latent representations are extracted by the encoder, and then the decoder reconstructs the data without batch effects. The NormAE method was tested on two real metabolomics data sets. After calibration by NormAE, the quality control samples (QCs) for both data sets gathered most closely in a PCA score plot (average distances decreased from 56.550 and 52.476 to 7.383 and 14.075, respectively) and obtained the highest average correlation coefficients (from 0.873 and 0.907 to 0.997 for both). Additionally, NormAE significantly improved biomarker discovery (median number of differential peaks increased from 322 and 466 to 1140 and 1622, respectively). NormAE was compared with four commonly used batch effect removal methods. The results demonstrated that using NormAE produces the best calibration results.
基于液相色谱-质谱的无靶向代谢组学受到非线性批次效应的影响,这些效应掩盖了生物学效应,导致不可重复性,并且难以校准。在这项研究中,我们提出了一种新的深度学习模型,称为归一化自动编码器(NormAE),它基于非线性自动编码器(AEs)和对抗学习。在 AE 模型的训练过程中,我们还训练了一个额外的分类器和排序器,以提供对抗正则化,通过编码器提取潜在表示,然后解码器在没有批次效应的情况下重建数据。我们在两个真实的代谢组学数据集上测试了 NormAE 方法。经过 NormAE 校准后,两个数据集的质控样品(QC)在 PCA 得分图中聚集得更紧密(平均距离从 56.550 和 52.476 分别降低到 7.383 和 14.075),并获得了最高的平均相关系数(分别从 0.873 和 0.907 提高到 0.997)。此外,NormAE 显著提高了生物标志物的发现(差异峰的中位数数量从 322 和 466 分别增加到 1140 和 1622)。我们将 NormAE 与四种常用的批次效应去除方法进行了比较。结果表明,使用 NormAE 可以产生最佳的校准效果。