Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America.
Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America.
PLoS Comput Biol. 2020 Oct 27;16(10):e1007939. doi: 10.1371/journal.pcbi.1007939. eCollection 2020 Oct.
Several studies profile similar single cell RNA-Seq (scRNA-Seq) data using different technologies and platforms. A number of alignment methods have been developed to enable the integration and comparison of scRNA-Seq data from such studies. While each performs well on some of the datasets, to date no method was able to both perform the alignment using the original expression space and generalize to new data. To enable such analysis we developed Single Cell Iterative Point set Registration (SCIPR) which extends methods that were successfully applied to align image data to scRNA-Seq. We discuss the required changes needed, the resulting optimization function, and algorithms for learning a transformation function for aligning data. We tested SCIPR on several scRNA-Seq datasets. As we show it successfully aligns data from several different cell types, improving upon prior methods proposed for this task. In addition, we show the parameters learned by SCIPR can be used to align data not used in the training and to identify key cell type-specific genes.
已有多项研究采用不同技术和平台对相似的单细胞 RNA 测序 (scRNA-Seq) 数据进行了描绘。已经开发了许多比对方法来实现来自这些研究的 scRNA-Seq 数据的整合和比较。虽然每种方法在某些数据集上都表现良好,但迄今为止,没有一种方法能够在使用原始表达空间进行对齐的同时推广到新数据。为了实现这种分析,我们开发了单细胞迭代点集配准 (SCIPR),它扩展了成功应用于将图像数据与 scRNA-Seq 对齐的方法。我们讨论了所需的更改、由此产生的优化函数以及用于学习对齐数据的变换函数的算法。我们在几个 scRNA-Seq 数据集上测试了 SCIPR。正如我们所展示的,它可以成功地对齐来自几种不同细胞类型的数据,优于为该任务提出的先前方法。此外,我们还展示了 SCIPR 学习到的参数可用于对齐未在训练中使用的数据,并识别关键的细胞类型特异性基因。