LSC, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China.
School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.
Bioinformatics. 2021 Dec 22;38(1):211-219. doi: 10.1093/bioinformatics/btab594.
Single-cell multi-omics sequencing data can provide a comprehensive molecular view of cells. However, effective approaches for the integrative analysis of such data are challenging. Existing manifold alignment methods demonstrated the state-of-the-art performance on single-cell multi-omics data integration, but they are often limited by requiring that single-cell datasets be derived from the same underlying cellular structure.
In this study, we present Pamona, a partial Gromov-Wasserstein distance-based manifold alignment framework that integrates heterogeneous single-cell multi-omics datasets with the aim of delineating and representing the shared and dataset-specific cellular structures across modalities. We formulate this task as a partial manifold alignment problem and develop a partial Gromov-Wasserstein optimal transport framework to solve it. Pamona identifies both shared and dataset-specific cells based on the computed probabilistic couplings of cells across datasets, and it aligns cellular modalities in a common low-dimensional space, while simultaneously preserving both shared and dataset-specific structures. Our framework can easily incorporate prior information, such as cell type annotations or cell-cell correspondence, to further improve alignment quality. We evaluated Pamona on a comprehensive set of publicly available benchmark datasets. We demonstrated that Pamona can accurately identify shared and dataset-specific cells, as well as faithfully recover and align cellular structures of heterogeneous single-cell modalities in a common space, outperforming the comparable existing methods.
Pamona software is available at https://github.com/caokai1073/Pamona.
Supplementary data are available at Bioinformatics online.
单细胞多组学测序数据可以提供细胞的全面分子视图。然而,有效的综合分析方法仍然具有挑战性。现有的流形对齐方法在单细胞多组学数据集成方面表现出了最先进的性能,但它们通常受到限制,要求单细胞数据集来自相同的基础细胞结构。
在这项研究中,我们提出了 Pamona,这是一种基于部分 Gromov-Wasserstein 距离的流形对齐框架,旨在描绘和表示跨模态的共享和数据集特定的细胞结构。我们将此任务表述为部分流形对齐问题,并开发了部分 Gromov-Wasserstein 最优传输框架来解决它。Pamona 根据跨数据集的细胞计算出的概率耦合来识别共享和数据集特定的细胞,并在公共低维空间中对齐细胞模态,同时保留共享和数据集特定的结构。我们的框架可以轻松地合并先验信息,例如细胞类型注释或细胞-细胞对应关系,以进一步提高对齐质量。我们在一组全面的公开基准数据集上评估了 Pamona。我们证明了 Pamona 可以准确地识别共享和数据集特定的细胞,以及忠实地恢复和对齐公共空间中异构单细胞模态的细胞结构,优于可比的现有方法。
Pamona 软件可在 https://github.com/caokai1073/Pamona 上获得。
补充数据可在生物信息学在线获得。