Zhang Ziqi, Zhao Xinye, Qiu Peng, Zhang Xiuwei
School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, United States.
School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, United States.
bioRxiv. 2023 May 2:2023.05.01.538975. doi: 10.1101/2023.05.01.538975.
Single-cell RNA-sequencing (scRNA-seq) has been widely used for disease studies, where sample batches are collected from donors under different conditions including demographical groups, disease stages, and drug treatments. It is worth noting that the differences among sample batches in such a study are a mixture of technical confounders caused by batch effect and the biological variations caused by condition effect. However, current batch effect removal methods often eliminate both technical batch effects and meaningful condition effects, while perturbation prediction methods solely focus on condition effects, resulting in inaccurate gene expression predictions due to unaccounted batch effects. Here we introduce scDisInFact, a deep learning framework that models both batch effect and condition effect in scRNA-seq data. scDisInFact learns latent factors that disentangle condition effects from batch effects, enabling it to simultaneously perform three tasks: batch effect removal, condition-associated key gene detection, and perturbation prediction. We evaluated scDisInFact on both simulated and real datasets, and compared its performance to baseline methods for each task. Our results demonstrate that scDisInFact outperforms existing methods that focus on individual tasks, providing a more comprehensive and accurate approach for integrating and predicting multi-batch multi-condition single-cell RNA-sequencing data.
单细胞RNA测序(scRNA-seq)已广泛应用于疾病研究,在这些研究中,样本批次是从不同条件下的供体收集的,这些条件包括人口统计学分组、疾病阶段和药物治疗。值得注意的是,在这样的研究中,样本批次之间的差异是由批次效应引起的技术混杂因素和由条件效应引起的生物学变异的混合。然而,当前的批次效应去除方法往往会消除技术批次效应和有意义的条件效应,而扰动预测方法仅关注条件效应,由于未考虑批次效应,导致基因表达预测不准确。在这里,我们介绍了scDisInFact,这是一个深度学习框架,用于对scRNA-seq数据中的批次效应和条件效应进行建模。scDisInFact学习潜在因素,将条件效应与批次效应解开,使其能够同时执行三项任务:批次效应去除、与条件相关的关键基因检测和扰动预测。我们在模拟数据集和真实数据集上对scDisInFact进行了评估,并将其性能与每个任务的基线方法进行了比较。我们的结果表明,scDisInFact优于专注于单个任务的现有方法,为整合和预测多批次多条件单细胞RNA测序数据提供了一种更全面、准确的方法。