scCorrect：使用域适应实现从单细胞RNA测序到单细胞染色质可及性测序的跨模态标签转移。

scCorrect: Cross-modality label transfer from scRNA-seq to scATAC-seq using domain adaptation.

作者信息

Liu Yan, Pei Wenyi, Chen Li, Xia Yu, Yan He, Hu Xiaohua

机构信息

Department of Computer Science, Yangzhou University, Yangzhou, 225100, PR China.

Geriatric Department, Shanghai Baoshan District Wusong Central Hospital, Tongtai North Road 101, Shanghai, 200940, PR China.

出版信息

Anal Biochem. 2025 Jul;702:115847. doi: 10.1016/j.ab.2025.115847. Epub 2025 Mar 27.

DOI:10.1016/j.ab.2025.115847

PMID:40154828

Abstract

Cell type annotation in single-cell chromatin accessibility sequencing (scATAC-seq) is crucial for enabling researchers to identify subpopulations of cells associated with specific diseases, elucidate gene regulatory networks, and discover markers indicative of disease states. The prevailing approach for cell type annotation in single-cell research involves transferring well-delineated cell types from single-cell RNA sequencing (scRNA-seq) data to scATAC-seq data using a label propagation algorithm. However, the inherent modal discrepancies (i.e.biological interpretation) between scRNA-seq and scATAC-seq data, coupled with the intrinsic sparsity and high dimensionality of scATAC-seq data, pose significant challenges to the efficacy of this strategy. To address these challenges, we introduce a novel neural network framework, scCorrect, which operates in two distinct phases. In the first phase, scCorrect aligns the scRNA-seq and scATAC-seq datasets, generating initial annotation results. The second phase involves training a corrective network specifically designed to amend any erroneous annotations produced during the first phase. Empirical tests across multiple datasets have demonstrated that scCorrect consistently achieves superior recognition accuracy, underscoring its significant potential to enhance disease-related research in humans.

摘要

单细胞染色质可及性测序（scATAC-seq）中的细胞类型注释对于研究人员识别与特定疾病相关的细胞亚群、阐明基因调控网络以及发现指示疾病状态的标志物至关重要。单细胞研究中细胞类型注释的主流方法是使用标签传播算法将单细胞RNA测序（scRNA-seq）数据中明确界定的细胞类型转移到scATAC-seq数据中。然而，scRNA-seq和scATAC-seq数据之间固有的模态差异（即生物学解释），再加上scATAC-seq数据固有的稀疏性和高维度性，对该策略的有效性构成了重大挑战。为应对这些挑战，我们引入了一种新颖的神经网络框架scCorrect，它分两个不同阶段运行。在第一阶段，scCorrect对齐scRNA-seq和scATAC-seq数据集，生成初始注释结果。第二阶段涉及训练一个专门设计的校正网络，以修正第一阶段产生的任何错误注释。对多个数据集的实证测试表明，scCorrect始终能实现卓越的识别准确率，突显了其在加强人类疾病相关研究方面的巨大潜力。