HCI/IWR, Heidelberg University, 69120 Heidelberg, Germany.
Université Gustave Eiffel, CNRS, LIGM, F-77454 Marne-la-Vallée, France.
Bioinformatics. 2022 Jun 24;38(Suppl 1):i316-i324. doi: 10.1093/bioinformatics/btac249.
Single-cell RNA sequencing (scRNA-seq) allows studying the development of cells in unprecedented detail. Given that many cellular differentiation processes are hierarchical, their scRNA-seq data are expected to be approximately tree-shaped in gene expression space. Inference and representation of this tree structure in two dimensions is highly desirable for biological interpretation and exploratory analysis.
Our two contributions are an approach for identifying a meaningful tree structure from high-dimensional scRNA-seq data, and a visualization method respecting the tree structure. We extract the tree structure by means of a density-based maximum spanning tree on a vector quantization of the data and show that it captures biological information well. We then introduce density-tree biased autoencoder (DTAE), a tree-biased autoencoder that emphasizes the tree structure of the data in low dimensional space. We compare to other dimension reduction methods and demonstrate the success of our method both qualitatively and quantitatively on real and toy data.
Our implementation relying on PyTorch and Higra is available at github.com/hci-unihd/DTAE.
Supplementary data are available at Bioinformatics online.
单细胞 RNA 测序 (scRNA-seq) 允许以前所未有的细节研究细胞的发育。鉴于许多细胞分化过程是层次化的,它们的 scRNA-seq 数据在基因表达空间中预计是近似树状的。在二维空间中对这种树结构进行推断和表示对于生物学解释和探索性分析是非常需要的。
我们的两个贡献是一种从高维 scRNA-seq 数据中识别有意义的树结构的方法,以及一种尊重树结构的可视化方法。我们通过对数据的向量量化进行基于密度的最大生成树来提取树结构,并表明它很好地捕获了生物学信息。然后,我们引入了密度树有偏自动编码器 (DTAE),这是一种在低维空间中强调数据树结构的有偏自动编码器。我们将其与其他降维方法进行比较,并在真实和模拟数据上定性和定量地证明了我们方法的成功。
我们基于 PyTorch 和 Higra 的实现可在 github.com/hci-unihd/DTAE 上获得。
补充数据可在《生物信息学》在线获得。