School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, USA.
School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA.
Nat Commun. 2024 Jan 30;15(1):912. doi: 10.1038/s41467-024-45227-w.
Single-cell RNA-sequencing (scRNA-seq) has been widely used for disease studies, where sample batches are collected from donors under different conditions including demographic groups, disease stages, and drug treatments. It is worth noting that the differences among sample batches in such a study are a mixture of technical confounders caused by batch effect and biological variations caused by condition effect. However, current batch effect removal methods often eliminate both technical batch effect and meaningful condition effect, while perturbation prediction methods solely focus on condition effect, resulting in inaccurate gene expression predictions due to unaccounted batch effect. Here we introduce scDisInFact, a deep learning framework that models both batch effect and condition effect in scRNA-seq data. scDisInFact learns latent factors that disentangle condition effect from batch effect, enabling it to simultaneously perform three tasks: batch effect removal, condition-associated key gene detection, and perturbation prediction. We evaluate scDisInFact on both simulated and real datasets, and compare its performance with baseline methods for each task. Our results demonstrate that scDisInFact outperforms existing methods that focus on individual tasks, providing a more comprehensive and accurate approach for integrating and predicting multi-batch multi-condition single-cell RNA-sequencing data.
单细胞 RNA 测序 (scRNA-seq) 已广泛应用于疾病研究,其中样本批次是从不同条件下的供体中收集的,包括人口统计学群体、疾病阶段和药物治疗。值得注意的是,在这种研究中,样本批次之间的差异是由批次效应引起的技术混杂因素和由条件效应引起的生物学变异的混合物。然而,目前的批次效应去除方法通常会同时消除技术批次效应和有意义的条件效应,而扰动预测方法则仅关注条件效应,由于未考虑批次效应,导致基因表达预测不准确。在这里,我们介绍了 scDisInFact,这是一个用于 scRNA-seq 数据中批次效应和条件效应建模的深度学习框架。scDisInFact 学习从批次效应中分离条件效应的潜在因素,使其能够同时执行三个任务:批次效应去除、与条件相关的关键基因检测和扰动预测。我们在模拟和真实数据集上评估了 scDisInFact,并将其性能与每个任务的基线方法进行了比较。我们的结果表明,scDisInFact 优于专注于单个任务的现有方法,为整合和预测多批次多条件单细胞 RNA-seq 数据提供了更全面、更准确的方法。