Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, USA.
Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, USA.
Genome Biol. 2022 Sep 5;23(1):186. doi: 10.1186/s13059-022-02749-0.
Current methods for analyzing single-cell datasets have relied primarily on static gene expression measurements to characterize the molecular state of individual cells. However, capturing temporal changes in cell state is crucial for the interpretation of dynamic phenotypes such as the cell cycle, development, or disease progression. RNA velocity infers the direction and speed of transcriptional changes in individual cells, yet it is unclear how these temporal gene expression modalities may be leveraged for predictive modeling of cellular dynamics.
Here, we present the first task-oriented benchmarking study that investigates integration of temporal sequencing modalities for dynamic cell state prediction. We benchmark ten integration approaches on ten datasets spanning different biological contexts, sequencing technologies, and species. We find that integrated data more accurately infers biological trajectories and achieves increased performance on classifying cells according to perturbation and disease states. Furthermore, we show that simple concatenation of spliced and unspliced molecules performs consistently well on classification tasks and can be used over more memory intensive and computationally expensive methods.
This work illustrates how integrated temporal gene expression modalities may be leveraged for predicting cellular trajectories and sample-associated perturbation and disease phenotypes. Additionally, this study provides users with practical recommendations for task-specific integration of single-cell gene expression modalities.
目前分析单细胞数据集的方法主要依赖于静态基因表达测量来描述单个细胞的分子状态。然而,捕捉细胞状态的时间变化对于解释动态表型(如细胞周期、发育或疾病进展)至关重要。RNA 速度推断了单个细胞中转录变化的方向和速度,但尚不清楚如何利用这些时间基因表达模式来预测细胞动力学。
在这里,我们提出了第一个面向任务的基准研究,该研究调查了用于动态细胞状态预测的时间测序模式的整合。我们在十个数据集上对十个整合方法进行了基准测试,这些数据集涵盖了不同的生物学背景、测序技术和物种。我们发现,整合数据更准确地推断了生物轨迹,并在根据扰动和疾病状态对细胞进行分类方面提高了性能。此外,我们表明,拼接和未拼接分子的简单串联在分类任务上表现良好,并且可以在更占用内存和计算资源密集的方法上使用。
这项工作说明了如何利用整合的时间基因表达模式来预测细胞轨迹以及与样本相关的扰动和疾病表型。此外,这项研究为特定任务的单细胞基因表达模式的集成提供了实用建议。