Department of Electrical Engineering, Stanford, Stanford, 94305, California, USA.
BD Genomics, California, 94025, Menlo Park, USA.
BMC Bioinformatics. 2018 Mar 9;19(1):93. doi: 10.1186/s12859-018-2092-7.
With the recent proliferation of single-cell RNA-Seq experiments, several methods have been developed for unsupervised analysis of the resulting datasets. These methods often rely on unintuitive hyperparameters and do not explicitly address the subjectivity associated with clustering.
In this work, we present DendroSplit, an interpretable framework for analyzing single-cell RNA-Seq datasets that addresses both the clustering interpretability and clustering subjectivity issues. DendroSplit offers a novel perspective on the single-cell RNA-Seq clustering problem motivated by the definition of "cell type", allowing us to cluster using feature selection to uncover multiple levels of biologically meaningful populations in the data. We analyze several landmark single-cell datasets, demonstrating both the method's efficacy and computational efficiency.
DendroSplit offers a clustering framework that is comparable to existing methods in terms of accuracy and speed but is novel in its emphasis on interpretabilty. We provide the full DendroSplit software package at https://github.com/jessemzhang/dendrosplit .
随着单细胞 RNA-Seq 实验的最近激增,已经开发了几种用于对所得数据集进行无监督分析的方法。这些方法通常依赖于难以理解的超参数,并且没有明确解决聚类相关的主观性问题。
在这项工作中,我们提出了 DendroSplit,这是一种用于分析单细胞 RNA-Seq 数据集的可解释框架,该框架解决了聚类可解释性和聚类主观性问题。DendroSplit 提供了一种新颖的单细胞 RNA-Seq 聚类问题视角,其灵感来自于“细胞类型”的定义,使我们能够使用特征选择进行聚类,从而揭示数据中多个具有生物学意义的群体层次。我们分析了几个具有里程碑意义的单细胞数据集,展示了该方法的功效和计算效率。
DendroSplit 提供了一个聚类框架,在准确性和速度方面可与现有方法相媲美,但在强调可解释性方面具有新颖性。我们在 https://github.com/jessemzhang/dendrosplit 上提供了完整的 DendroSplit 软件包。