Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA.
Center for Computational Biology.
Bioinformatics. 2020 Jun 1;36(11):3418-3421. doi: 10.1093/bioinformatics/btaa169.
Single-cell RNA-seq makes possible the investigation of variability in gene expression among cells, and dependence of variation on cell type. Statistical inference methods for such analyses must be scalable, and ideally interpretable.
We present an approach based on a modification of a recently published highly scalable variational autoencoder framework that provides interpretability without sacrificing much accuracy. We demonstrate that our approach enables identification of gene programs in massive datasets. Our strategy, namely the learning of factor models with the auto-encoding variational Bayes framework, is not domain specific and may be useful for other applications.
The factor model is available in the scVI package hosted at https://github.com/YosefLab/scVI/.
Supplementary data are available at Bioinformatics online.
单细胞 RNA-seq 使得研究细胞间基因表达的变异性以及变异性对细胞类型的依赖性成为可能。用于此类分析的统计推断方法必须是可扩展的,并且理想情况下是可解释的。
我们提出了一种基于最近发表的高度可扩展变分自动编码器框架的修改的方法,该方法在不牺牲准确性的情况下提供了可解释性。我们证明了我们的方法能够在大规模数据集中识别基因程序。我们的策略,即使用自动编码变分贝叶斯框架学习因子模型,不是特定于领域的,可能对其他应用有用。
因子模型可在位于 https://github.com/YosefLab/scVI/ 的 scVI 包中使用。
补充数据可在 Bioinformatics 在线获得。