Department of Computer Science and Technology, University of Cambridge, Cambridge CB3 0FD, UK.
Department of Engineering Science, University of Oxford, Oxford OX1 3PJ, UK.
Bioinformatics. 2022 Feb 7;38(5):1277-1286. doi: 10.1093/bioinformatics/btab804.
Single-cell RNA sequencing allows high-resolution views of individual cells for libraries of up to millions of samples, thus motivating the use of deep learning for analysis. In this study, we introduce the use of graph neural networks for the unsupervised exploration of scRNA-seq data by developing a variational graph autoencoder architecture with graph attention layers that operates directly on the connectivity between cells, focusing on dimensionality reduction and clustering. With the help of several case studies, we show that our model, named CellVGAE, can be effectively used for exploratory analysis even on challenging datasets, by extracting meaningful features from the data and providing the means to visualize and interpret different aspects of the model.
We show that CellVGAE is more interpretable than existing scRNA-seq variational architectures by analysing the graph attention coefficients. By drawing parallels with other scRNA-seq studies on interpretability, we assess the validity of the relationships modelled by attention, and furthermore, we show that CellVGAE can intrinsically capture information such as pseudotime and NF-ĸB activation dynamics, the latter being a property that is not generally shared by existing neural alternatives. We then evaluate the dimensionality reduction and clustering performance on 9 difficult and well-annotated datasets by comparing with three leading neural and non-neural techniques, concluding that CellVGAE outperforms competing methods. Finally, we report a decrease in training times of up to × 20 on a dataset of 1.3 million cells compared to existing deep learning architectures.
The CellVGAE code is available at https://github.com/davidbuterez/CellVGAE.
Supplementary data are available at Bioinformatics online.
单细胞 RNA 测序允许对多达数百万个样本的文库进行单个细胞的高分辨率观察,从而激发了对分析的深度学习的使用。在这项研究中,我们通过开发具有图注意力层的变分图自动编码器架构,介绍了图神经网络在 scRNA-seq 数据的无监督探索中的使用,该架构直接在细胞之间的连接上运行,重点是降维和聚类。通过几个案例研究,我们表明,我们的模型,名为 CellVGAE,可以有效地用于探索性分析,即使在具有挑战性的数据集上,也可以从数据中提取有意义的特征,并提供可视化和解释模型不同方面的手段。
通过分析图注意力系数,我们表明 CellVGAE 比现有的 scRNA-seq 变分架构更具可解释性。通过与其他关于可解释性的 scRNA-seq 研究进行类比,我们评估了注意力所建模的关系的有效性,此外,我们还表明 CellVGAE 可以内在地捕获信息,例如伪时间和 NF-ĸB 激活动态,这是现有神经替代方案通常不具备的特性。然后,我们通过将其与三种领先的神经和非神经技术进行比较,在 9 个困难且注释良好的数据集上评估降维和聚类性能,得出结论认为 CellVGAE 优于竞争方法。最后,我们报告了与现有的深度学习架构相比,在一个包含 130 万个细胞的数据集上训练时间减少了高达 20 倍。
CellVGAE 代码可在 https://github.com/davidbuterez/CellVGAE 获得。
补充数据可在生物信息学在线获得。