Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA.
Department of Human Genetics and Department of Medicine, University of Chicago, Chicago, IL 60637, USA.
Nucleic Acids Res. 2022 Feb 28;50(4):e21. doi: 10.1093/nar/gkab1147.
Data alignment is one of the first key steps in single cell analysis for integrating multiple datasets and performing joint analysis across studies. Data alignment is challenging in extremely large datasets, however, as the major of the current single cell data alignment methods are not computationally efficient. Here, we present VIPCCA, a computational framework based on non-linear canonical correlation analysis for effective and scalable single cell data alignment. VIPCCA leverages both deep learning for effective single cell data modeling and variational inference for scalable computation, thus enabling powerful data alignment across multiple samples, multiple data platforms, and multiple data types. VIPCCA is accurate for a range of alignment tasks including alignment between single cell RNAseq and ATACseq datasets and can easily accommodate millions of cells, thereby providing researchers unique opportunities to tackle challenges emerging from large-scale single-cell atlas.
数据对齐是单细胞分析中整合多个数据集和跨研究进行联合分析的首要关键步骤之一。然而,在极其大型的数据集上,数据对齐具有挑战性,因为当前大多数单细胞数据对齐方法在计算上效率不高。在这里,我们提出了 VIPCCA,这是一个基于非线性典型相关分析的计算框架,用于有效的和可扩展的单细胞数据对齐。VIPCCA 利用深度学习进行有效的单细胞数据建模和变分推断进行可扩展的计算,从而能够在多个样本、多个数据平台和多个数据类型之间进行强大的数据对齐。VIPCCA 在一系列对齐任务中都具有准确性,包括单细胞 RNAseq 和 ATACseq 数据集之间的对齐,并且可以轻松处理数百万个细胞,从而为研究人员提供了独特的机会来应对大规模单细胞图谱中出现的挑战。