Department of Biostatistics, School of Public Health, Peking University, 38 Xueyuan Rd., Haidian District, Beijing 100191, China.
Peking University Cancer Hospital, 52 Fucheng Rd., Haidian District, Beijing 100142, China.
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae540.
The integration of data from multiple modalities generated by single-cell omics technologies is crucial for accurately identifying cell states. One challenge in comprehending multi-omics data resides in mosaic integration, in which different data modalities are profiled in different subsets of cells, as it requires simultaneous batch effect removal and modality alignment. Here, we develop Multi-omics Mosaic Auto-scaling Attention Variational Inference (mmAAVI), a scalable deep generative model for single-cell mosaic integration. Leveraging auto-scaling self-attention mechanisms, mmAAVI can map arbitrary combinations of omics to the common embedding space. If existing well-annotated cell states, the model can perform semisupervised learning to utilize existing these annotations. We validated the performance of mmAAVI and five other commonly used methods on four benchmark datasets, which vary in cell numbers, omics types, and missing patterns. mmAAVI consistently demonstrated its superiority. We also validated mmAAVI's ability for cell state knowledge transfer, achieving balanced accuracies of 0.82 and 0.97 with less 1% labeled cells between batches with completely different omics. The full package is available at https://github.com/luyiyun/mmAAVI.
单细胞多组学技术产生的多模态数据的整合对于准确识别细胞状态至关重要。理解多组学数据的一个挑战在于嵌合体整合,其中不同的数据模态在不同的细胞亚群中进行分析,因为它需要同时去除批次效应和模态对齐。在这里,我们开发了 Multi-omics Mosaic Auto-scaling Attention Variational Inference (mmAAVI),这是一种用于单细胞嵌合体整合的可扩展深度生成模型。利用自缩放自注意机制,mmAAVI 可以将任意组合的组学映射到共同的嵌入空间中。如果存在经过良好注释的细胞状态,该模型可以进行半监督学习以利用这些现有注释。我们在四个基准数据集上验证了 mmAAVI 和其他五种常用方法的性能,这些数据集在细胞数量、组学类型和缺失模式方面有所不同。mmAAVI 始终表现出优越性。我们还验证了 mmAAVI 在细胞状态知识转移方面的能力,在批次之间具有完全不同的组学的情况下,使用少于 1%的标记细胞实现了平衡准确率为 0.82 和 0.97。完整的软件包可在 https://github.com/luyiyun/mmAAVI 上获得。