Andrade Aixa X, Nguyen Son, Montillo Albert
Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA.
Res Sq. 2025 Mar 19:rs.3.rs-6081478. doi: 10.21203/rs.3.rs-6081478/v1.
scRNA-seq data has the potential to provide new insights into cellular heterogeneity and data acquisition; however, a major challenge is unraveling confounding from technical and biological batch effects. Existing batch correction algorithms suppress and discard these effects, rather than quantifying and modeling them. Here, we present scMEDAL, a framework for , which separately models batch-invariant and batch-specific effects using two complementary autoencoder networks. One network is trained through adversarial learning to capture a batch-invariant representation, while a autoencoder learns a batch-specific representation. Comprehensive evaluations spanning conditions (e.g., autism, leukemia, and cardiovascular), cell types, and technical and biological effects demonstrate that scMEDAL suppresses batch effects while modeling batch-specific variation, enhancing accuracy and interpretability. Unlike prior approaches, the framework's fixed- and random-effects autoencoders enable retrospective analyses, including predicting a cell's expression as if it had been acquired in a different batch via genomap projections at the cellular level, revealing the impact of biological (e.g., diagnosis) and technical (e.g., acquisition) effects. By combining scMEDAL's batch-agnostic and batch-specific latent spaces, it enables more accurate predictions of disease status, donor group, and cell type, making scMEDAL a valuable framework for gaining deeper insight into data acquisition and cellular heterogeneity.
单细胞RNA测序(scRNA-seq)数据有潜力为细胞异质性和数据采集提供新的见解;然而,一个主要挑战是解开技术和生物学批次效应带来的混杂因素。现有的批次校正算法抑制并摒弃这些效应,而非对其进行量化和建模。在此,我们提出了scMEDAL,这是一个用于 的框架,它使用两个互补的自动编码器网络分别对批次不变效应和批次特定效应进行建模。一个网络通过对抗学习进行训练,以捕捉批次不变的表示,而一个自动编码器学习批次特定的表示。涵盖多种条件(如自闭症、白血病和心血管疾病)、细胞类型以及技术和生物学效应的综合评估表明,scMEDAL在对批次特定变异进行建模的同时抑制了批次效应,提高了准确性和可解释性。与先前的方法不同,该框架的固定效应和随机效应自动编码器能够进行回顾性分析,包括在细胞水平通过基因组图谱投影预测一个细胞的表达,就好像它是在不同批次中获取的一样,揭示生物学(如诊断)和技术(如采集)效应的影响。通过结合scMEDAL的批次无关和批次特定潜在空间,它能够更准确地预测疾病状态、供体组和细胞类型,使scMEDAL成为一个有价值的框架,有助于更深入地了解数据采集和细胞异质性。 (注:原文中“a framework for ”部分缺失具体内容)