Lin Chieh, Jain Siddhartha, Kim Hannah, Bar-Joseph Ziv
Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
Computer Science Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
Nucleic Acids Res. 2017 Sep 29;45(17):e156. doi: 10.1093/nar/gkx681.
While only recently developed, the ability to profile expression data in single cells (scRNA-Seq) has already led to several important studies and findings. However, this technology has also raised several new computational challenges. These include questions about the best methods for clustering scRNA-Seq data, how to identify unique group of cells in such experiments, and how to determine the state or function of specific cells based on their expression profile. To address these issues we develop and test a method based on neural networks (NN) for the analysis and retrieval of single cell RNA-Seq data. We tested various NN architectures, some of which incorporate prior biological knowledge, and used these to obtain a reduced dimension representation of the single cell expression data. We show that the NN method improves upon prior methods in both, the ability to correctly group cells in experiments not used in the training and the ability to correctly infer cell type or state by querying a database of tens of thousands of single cell profiles. Such database queries (which can be performed using our web server) will enable researchers to better characterize cells when analyzing heterogeneous scRNA-Seq samples.
虽然单细胞表达数据的分析能力(单细胞RNA测序,scRNA-Seq)最近才得以发展,但它已经促成了多项重要的研究和发现。然而,这项技术也带来了一些新的计算挑战。这些挑战包括关于scRNA-Seq数据聚类的最佳方法、如何在此类实验中识别独特的细胞群,以及如何根据特定细胞的表达谱确定其状态或功能等问题。为了解决这些问题,我们开发并测试了一种基于神经网络(NN)的方法,用于分析和检索单细胞RNA测序数据。我们测试了各种神经网络架构,其中一些融入了先前的生物学知识,并利用这些架构获得单细胞表达数据的降维表示。我们表明,在未用于训练的实验中正确分组细胞的能力以及通过查询数万个单细胞图谱的数据库正确推断细胞类型或状态的能力方面,神经网络方法均优于先前的方法。此类数据库查询(可使用我们的网络服务器执行)将使研究人员在分析异质scRNA-Seq样本时能够更好地表征细胞。