Zhejiang University, Hangzhou, 310058, China.
AI Division, School of Engineering, Westlake University, Hangzhou, 310024, China.
Commun Biol. 2023 Apr 4;6(1):369. doi: 10.1038/s42003-023-04662-z.
Dimensionality reduction and visualization play an important role in biological data analysis, such as data interpretation of single-cell RNA sequences (scRNA-seq). It is desired to have a visualization method that can not only be applicable to various application scenarios, including cell clustering and trajectory inference, but also satisfy a variety of technical requirements, especially the ability to preserve inherent structure of data and handle with batch effects. However, no existing methods can accommodate these requirements in a unified framework. In this paper, we propose a general visualization method, deep visualization (DV), that possesses the ability to preserve inherent structure of data and handle batch effects and is applicable to a variety of datasets from different application domains and dataset scales. The method embeds a given dataset into a 2- or 3-dimensional visualization space, with either a Euclidean or hyperbolic metric depending on a specified task type with type static (at a time point) or dynamic (at a sequence of time points) scRNA-seq data, respectively. Specifically, DV learns a structure graph to describe the relationships between data samples, transforms the data into visualization space while preserving the geometric structure of the data and correcting batch effects in an end-to-end manner. The experimental results on nine datasets in complex tissue from human patients or animal development demonstrate the competitiveness of DV in discovering complex cellular relations, uncovering temporal trajectories, and addressing complex batch factors. We also provide a preliminary attempt to pre-train a DV model for visualization of new incoming data.
降维和可视化在生物数据分析中起着重要作用,例如单细胞 RNA 序列 (scRNA-seq) 的数据分析。人们希望有一种可视化方法,不仅可以适用于各种应用场景,包括细胞聚类和轨迹推断,而且还可以满足各种技术要求,特别是保留数据固有结构和处理批次效应的能力。然而,目前还没有一种方法可以在统一的框架中满足这些要求。在本文中,我们提出了一种通用的可视化方法,深度可视化 (DV),它具有保留数据固有结构和处理批次效应的能力,适用于来自不同应用领域和数据集规模的各种数据集。该方法将给定的数据集嵌入到 2 维和 3 维可视化空间中,根据指定的任务类型,分别使用欧几里得或双曲度量,任务类型静态(在一个时间点)或动态(在一系列时间点) scRNA-seq 数据。具体来说,DV 学习结构图来描述数据样本之间的关系,在将数据转换到可视化空间的同时,保留数据的几何结构,并以端到端的方式纠正批次效应。在来自人类患者或动物发育的复杂组织中的九个数据集上的实验结果表明,DV 在发现复杂细胞关系、揭示时间轨迹和解决复杂批次因素方面具有竞争力。我们还初步尝试了为新传入数据的可视化预训练 DV 模型。