Shaham Uri, Stanton Kelly P, Zhao Jun, Li Huamin, Raddassi Khadir, Montgomery Ruth, Kluger Yuval
Department of Statistics, Yale University, New Haven, CT 06511, USA.
Department of Pathology, Yale School of Medicine, New Haven, CT 06510, USA.
Bioinformatics. 2017 Aug 15;33(16):2539-2546. doi: 10.1093/bioinformatics/btx196.
Sources of variability in experimentally derived data include measurement error in addition to the physical phenomena of interest. This measurement error is a combination of systematic components, originating from the measuring instrument and random measurement errors. Several novel biological technologies, such as mass cytometry and single-cell RNA-seq (scRNA-seq), are plagued with systematic errors that may severely affect statistical analysis if the data are not properly calibrated.
We propose a novel deep learning approach for removing systematic batch effects. Our method is based on a residual neural network, trained to minimize the Maximum Mean Discrepancy between the multivariate distributions of two replicates, measured in different batches. We apply our method to mass cytometry and scRNA-seq datasets, and demonstrate that it effectively attenuates batch effects.
our codes and data are publicly available at https://github.com/ushaham/BatchEffectRemoval.git.
Supplementary data are available at Bioinformatics online.
实验得出的数据中的变异性来源,除了感兴趣的物理现象外,还包括测量误差。这种测量误差是由测量仪器产生的系统成分和随机测量误差的组合。几种新型生物技术,如质谱流式细胞术和单细胞RNA测序(scRNA-seq),都存在系统误差,如果数据没有得到适当校准,这些误差可能会严重影响统计分析。
我们提出了一种用于消除系统批次效应的新型深度学习方法。我们的方法基于残差神经网络,经过训练以最小化在不同批次中测量的两个重复样本的多变量分布之间的最大均值差异。我们将我们的方法应用于质谱流式细胞术和scRNA-seq数据集,并证明它有效地减弱了批次效应。
我们的代码和数据可在https://github.com/ushaham/BatchEffectRemoval.git上公开获取。
补充数据可在《生物信息学》在线获取。