Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Tongji Hospital, School of Medicine, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China; State Key Laboratory of Cardiology and Medical Innovation Center, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China.
Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai 201804, China.
Cell Genom. 2024 May 8;4(5):100553. doi: 10.1016/j.xgen.2024.100553. Epub 2024 Apr 29.
Single-cell RNA sequencing (scRNA-seq) and T cell receptor sequencing (TCR-seq) are pivotal for investigating T cell heterogeneity. Integrating these modalities, which is expected to uncover profound insights in immunology that might otherwise go unnoticed with a single modality, faces computational challenges due to the low-resource characteristics of the multimodal data. Herein, we present UniTCR, a novel low-resource-aware multimodal representation learning framework designed for the unified cross-modality integration, enabling comprehensive T cell analysis. By designing a dual-modality contrastive learning module and a single-modality preservation module to effectively embed each modality into a common latent space, UniTCR demonstrates versatility in connecting TCR sequences with T cell transcriptomes across various tasks, including single-modality analysis, modality gap analysis, epitope-TCR binding prediction, and TCR profile cross-modality generation, in a low-resource-aware way. Extensive evaluations conducted on multiple scRNA-seq/TCR-seq paired datasets showed the superior performance of UniTCR, exhibiting the ability of exploring the complexity of immune system.
单细胞 RNA 测序 (scRNA-seq) 和 T 细胞受体测序 (TCR-seq) 对于研究 T 细胞异质性至关重要。整合这些模式有望揭示免疫学的深刻见解,如果仅使用单一模式,这些见解可能会被忽略,但由于多模态数据的资源匮乏特征,这面临着计算挑战。在这里,我们提出了 UniTCR,这是一种新颖的低资源感知多模态表示学习框架,专为统一跨模态集成而设计,可实现全面的 T 细胞分析。通过设计双模态对比学习模块和单模态保留模块,有效地将每个模态嵌入到公共潜在空间中,UniTCR 在连接 TCR 序列和 T 细胞转录组方面表现出多功能性,可在各种任务中实现,包括单模态分析、模态间隙分析、表位-TCR 结合预测和 TCR 谱跨模态生成,以实现低资源感知。在多个 scRNA-seq/TCR-seq 配对数据集上进行的广泛评估表明,UniTCR 的性能优越,能够探索免疫系统的复杂性。