Andrade Aixa X, Nguyen Son, Montillo Albert
ArXiv. 2025 Mar 13:arXiv:2411.06635v3.
scRNA-seq data has the potential to provide new insights into cellular heterogeneity and data acquisition; however, a major challenge is unraveling confounding from technical and biological batch effects. Existing batch correction algorithms suppress and discard these effects, rather than quantifying and modeling them. Here, we present scMEDAL, a framework for single-cell Mixed Effects Deep Autoencoder Learning, which separately models batch-invariant and batch-specific effects using two complementary autoencoder networks. One network is trained through adversarial learning to capture a batch-invariant representation, while a Bayesian autoencoder learns a batch-specific representation. Comprehensive evaluations spanning conditions (e.g., autism, leukemia, and cardiovascular), cell types, and technical and biological effects demonstrate that scMEDAL suppresses batch effects while modeling batch-specific variation, enhancing accuracy and interpretability. Unlike prior approaches, the framework's fixed- and random-effects autoencoders enable retrospective analyses, including predicting a cell's expression as if it had been acquired in a different batch via genomap projections at the cellular level, revealing the impact of biological (e.g., diagnosis) and technical (e.g., acquisition) effects. By combining scMEDAL's batch-agnostic and batch-specific latent spaces, it enables more accurate predictions of disease status, donor group, and cell type, making scMEDAL a valuable framework for gaining deeper insight into data acquisition and cellular heterogeneity.
单细胞RNA测序(scRNA-seq)数据有潜力为细胞异质性和数据采集提供新的见解;然而,一个主要挑战是消除技术和生物学批次效应带来的混淆因素。现有的批次校正算法抑制并丢弃这些效应,而不是对其进行量化和建模。在这里,我们提出了scMEDAL,即单细胞混合效应深度自动编码器学习框架,它使用两个互补的自动编码器网络分别对批次不变效应和批次特异性效应进行建模。一个网络通过对抗学习进行训练,以捕获批次不变的表示,而贝叶斯自动编码器学习批次特异性表示。跨越多种条件(如自闭症、白血病和心血管疾病)、细胞类型以及技术和生物学效应的综合评估表明,scMEDAL在对批次特异性变异进行建模的同时抑制了批次效应,提高了准确性和可解释性。与先前的方法不同,该框架的固定效应和随机效应自动编码器能够进行回顾性分析,包括通过细胞水平的基因组图谱投影预测细胞的表达,就好像它是在不同批次中获取的一样,揭示生物学(如诊断)和技术(如采集)效应的影响。通过结合scMEDAL的批次无关和批次特异性潜在空间,它能够更准确地预测疾病状态、供体组和细胞类型,使scMEDAL成为深入了解数据采集和细胞异质性的有价值框架。