Li Xiangyu, Chen Weizheng, Chen Yang, Zhang Xuegong, Gu Jin, Zhang Michael Q
MOE Key Laboratory of Bioinformatics, TNLIST Bioinformatics Division/Center for Synthetic & System Biology, Department of Automation, Tsinghua University, Beijing 100084, China.
Institute of Network Computing and Information System, Department of Computer Science, Peking University, Beijing 100871, China.
Nucleic Acids Res. 2017 Nov 2;45(19):e166. doi: 10.1093/nar/gkx750.
Single cell RNA-seq (scRNA-seq) techniques can reveal valuable insights of cell-to-cell heterogeneities. Projection of high-dimensional data into a low-dimensional subspace is a powerful strategy in general for mining such big data. However, scRNA-seq suffers from higher noise and lower coverage than traditional bulk RNA-seq, hence bringing in new computational difficulties. One major challenge is how to deal with the frequent drop-out events. The events, usually caused by the stochastic burst effect in gene transcription and the technical failure of RNA transcript capture, often render traditional dimension reduction methods work inefficiently. To overcome this problem, we have developed a novel Single Cell Representation Learning (SCRL) method based on network embedding. This method can efficiently implement data-driven non-linear projection and incorporate prior biological knowledge (such as pathway information) to learn more meaningful low-dimensional representations for both cells and genes. Benchmark results show that SCRL outperforms other dimensional reduction methods on several recent scRNA-seq datasets.
单细胞RNA测序(scRNA-seq)技术能够揭示细胞间异质性的宝贵见解。将高维数据投影到低维子空间通常是挖掘此类大数据的有效策略。然而,与传统的批量RNA测序相比,scRNA-seq存在更高的噪声和更低的覆盖率,从而带来了新的计算难题。一个主要挑战是如何处理频繁出现的基因数据丢失事件。这些事件通常由基因转录中的随机爆发效应和RNA转录本捕获的技术故障引起,常常导致传统降维方法效率低下。为了克服这个问题,我们基于网络嵌入开发了一种新颖的单细胞表示学习(SCRL)方法。该方法能够高效地实现数据驱动的非线性投影,并纳入先验生物学知识(如通路信息),为细胞和基因学习更有意义的低维表示。基准测试结果表明,在最近的几个scRNA-seq数据集上,SCRL优于其他降维方法。