Humboldt-Universität zu Berlin, Department of Computer Science, 10099 Berlin, Germany.
Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Berlin, Germany.
Nucleic Acids Res. 2024 Jul 8;52(12):e52. doi: 10.1093/nar/gkae409.
Multi-omics characterization of single cells holds outstanding potential for profiling the dynamics and relations of gene regulatory states of thousands of cells. How to integrate multimodal data is an open problem, especially when aiming to combine data from multiple sources or conditions containing both biological and technical variation. We introduce liam, a flexible model for the simultaneous horizontal and vertical integration of paired single-cell multimodal data and mosaic integration of paired with unimodal data. Liam learns a joint low-dimensional representation of the measured modalities, which proves beneficial when the information content or quality of the modalities differ. Its integration accounts for complex batch effects using a tunable combination of conditional and adversarial training, which can be optimized using replicate information while retaining selected biological variation. We demonstrate liam's superior performance on multiple paired multimodal data types, including Multiome and CITE-seq data, and in mosaic integration scenarios. Our detailed benchmarking experiments illustrate the complexities and challenges remaining for integration and the meaningful assessment of its success.
单细胞多组学特征分析在分析数千个细胞的基因调控状态的动态和关系方面具有巨大的潜力。如何整合多模态数据是一个尚未解决的问题,特别是当目标是整合来自多个来源或条件的数据时,这些数据既包含生物学变化又包含技术变化。我们介绍了 liam,这是一种灵活的模型,可以同时对配对单细胞多模态数据进行水平和垂直整合,以及对配对与单模态数据进行镶嵌式整合。liam 学习测量模态的联合低维表示,当模态的信息量或质量不同时,这证明是有益的。它的整合使用条件和对抗训练的可调组合来解释复杂的批次效应,同时使用重复信息进行优化,同时保留选定的生物学变化。我们在多个配对多模态数据类型上展示了 liam 的卓越性能,包括 Multiome 和 CITE-seq 数据,以及镶嵌式整合场景。我们详细的基准测试实验说明了整合的复杂性和挑战以及对其成功的有意义的评估。