sCIN:用于单细胞多组学数据整合的对比学习框架。
sCIN: a contrastive learning framework for single-cell multi-omics data integration.
作者信息
Ebrahimi Amir, Siahpirani Alireza Fotuhi, Montazeri Hesam
机构信息
Department of Biotechnology, College of Science, University of Tehran, Ghods 37, Tehran, 1417763135, Iran.
Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Ghods 37, Tehran, 1417763135, Iran.
出版信息
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf411.
The rapid advancement of single-cell omics technologies such as single-cell RNA sequencing and single-cell assay for transposase-accessible chromatin with high throughput sequencing has transformed our understanding of cellular heterogeneity and regulatory mechanisms. However, integrating these data types remains challenging due to distributional discrepancies and distinct feature spaces. To address this, we present a novel single-cell Contrastive INtegration framework (sCIN) that integrates different omics modalities into a shared low-dimensional latent space. sCIN uses modality-specific encoders and contrastive learning to generate latent representations for each modality, aligning cells across modalities and removing technology-specific biases. The framework was designed to rigorously prevent data leakage between training and testing, and was extensively evaluated on three real-world paired datasets namely simultaneous high-throughput ATAC and RNA expression with sequencing, 10X PBMC (10k version), and cellular indexing of transcriptomes and epitopes, and one unpaired dataset of gene expression and chromatin accessibility. Paired datasets refer to multi-omics data generated using technologies capable of capturing different omics features from the same cell population while unpaired datasets are measured from different cell populations from a tissue. Results on paired and unpaired datasets show that sCIN outperforms state-of-the-art models, including scGLUE, scBridge, sciCAN, Con-AAE, Harmony, and MOFA+, across multiple metrics: average silhouette width for clustering quality, Recall@k, cell type@k, cell type accuracy, and median rank for integration quality. Moreover, sCIN was evaluated on simulated unpaired datasets derived from paired data, demonstrating its ability to leverage available biological information for effective multimodal integration. In summary, sCIN reliably integrates omics modalities while preserving biological meaning in both paired and unpaired settings.
单细胞转录组测序和单细胞转座酶可及染色质高通量测序等单细胞组学技术的快速发展,改变了我们对细胞异质性和调控机制的理解。然而,由于分布差异和不同的特征空间,整合这些数据类型仍然具有挑战性。为了解决这个问题,我们提出了一种新颖的单细胞对比整合框架(sCIN),该框架将不同的组学模态整合到一个共享的低维潜在空间中。sCIN使用特定模态的编码器和对比学习为每个模态生成潜在表示,对齐跨模态的细胞并消除技术特定的偏差。该框架旨在严格防止训练和测试之间的数据泄漏,并在三个真实世界的配对数据集上进行了广泛评估,即同时进行高通量ATAC和RNA测序表达、10X PBMC(10k版本)以及转录组和表位的细胞索引,以及一个基因表达和染色质可及性的非配对数据集。配对数据集是指使用能够从同一细胞群体中捕获不同组学特征的技术生成的多组学数据,而非配对数据集是从组织的不同细胞群体中测量得到的。配对和非配对数据集的结果表明,sCIN在多个指标上优于现有模型,包括scGLUE、scBridge、sciCAN、Con-AAE、Harmony和MOFA+:聚类质量的平均轮廓宽度、召回率@k、细胞类型@k、细胞类型准确率以及整合质量的中位数排名。此外,sCIN在从配对数据派生的模拟非配对数据集上进行了评估,证明了其利用可用生物信息进行有效多模态整合的能力。总之,sCIN在配对和非配对设置中都能可靠地整合组学模态,同时保留生物学意义。
相似文献
Brief Bioinform. 2025-7-2
Bioinformatics. 2025-8-2
Bioinformatics. 2025-7-1
Brief Bioinform. 2024-9-23
Brief Funct Genomics. 2025-1-15
本文引用的文献
IEEE J Biomed Health Inform. 2024-5
Nat Methods. 2023-8
Nat Rev Mol Cell Biol. 2023-10
Brief Bioinform. 2023-5-19
Brief Bioinform. 2023-1-19
Brief Bioinform. 2022-9-20