Xie Kaikun, Huang Yu, Zeng Feng, Liu Zehua, Chen Ting
Institute for Artificial Intelligence, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China.
Department of Automation, Xiamen University, Xiamen 361005, China.
NAR Genom Bioinform. 2020 Oct 9;2(4):lqaa082. doi: 10.1093/nargab/lqaa082. eCollection 2020 Dec.
Recent advancements in both single-cell RNA-sequencing technology and computational resources facilitate the study of cell types on global populations. Up to millions of cells can now be sequenced in one experiment; thus, accurate and efficient computational methods are needed to provide clustering and post-analysis of assigning putative and rare cell types. Here, we present a novel unsupervised deep learning clustering framework that is robust and highly scalable. To overcome the high level of noise, scAIDE first incorporates an autoencoder-imputation network with a distance-preserved embedding network (AIDE) to learn a good representation of data, and then applies a random projection hashing based -means algorithm to accommodate the detection of rare cell types. We analyzed a 1.3 million neural cell dataset within 30 min, obtaining 64 clusters which were mapped to 19 putative cell types. In particular, we further identified three different neural stem cell developmental trajectories in these clusters. We also classified two subpopulations of malignant cells in a small glioblastoma dataset using scAIDE. We anticipate that scAIDE would provide a more in-depth understanding of cell development and diseases.
单细胞RNA测序技术和计算资源的最新进展促进了对全球细胞群体中细胞类型的研究。现在,在一个实验中可以对数百万个细胞进行测序;因此,需要准确而高效的计算方法来对假定的和罕见的细胞类型进行聚类和后期分析。在此,我们提出了一种新颖的无监督深度学习聚类框架,该框架稳健且具有高度可扩展性。为了克服高水平的噪声,scAIDE首先将一个自动编码器插补网络与一个距离保持嵌入网络(AIDE)相结合,以学习数据的良好表示,然后应用基于随机投影哈希的均值算法来适应对罕见细胞类型的检测。我们在30分钟内分析了一个包含130万个神经细胞的数据集,获得了64个聚类,这些聚类被映射到19种假定的细胞类型。特别地,我们在这些聚类中进一步确定了三种不同的神经干细胞发育轨迹。我们还使用scAIDE对一个小胶质母细胞瘤数据集中的两个恶性细胞亚群进行了分类。我们预计scAIDE将为细胞发育和疾病提供更深入的理解。