School of Computer Science and Engineering, Central South University, Changsha 410083, China.
Singapore Immunology Network (SIgN), Agency for Science, Technology and Research (A*STAR), Singapore 138648, Singapore.
Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad505.
scATAC-seq has enabled chromatin accessibility landscape profiling at the single-cell level, providing opportunities for determining cell-type-specific regulation codes. However, high dimension, extreme sparsity, and large scale of scATAC-seq data have posed great challenges to cell-type identification. Thus, there has been a growing interest in leveraging the well-annotated scRNA-seq data to help annotate scATAC-seq data. However, substantial computational obstacles remain to transfer information from scRNA-seq to scATAC-seq, especially for their heterogeneous features.
We propose a new transfer learning method, scNCL, which utilizes prior knowledge and contrastive learning to tackle the problem of heterogeneous features. Briefly, scNCL transforms scATAC-seq features into gene activity matrix based on prior knowledge. Since feature transformation can cause information loss, scNCL introduces neighborhood contrastive learning to preserve the neighborhood structure of scATAC-seq cells in raw feature space. To learn transferable latent features, scNCL uses a feature projection loss and an alignment loss to harmonize embeddings between scRNA-seq and scATAC-seq. Experiments on various datasets demonstrated that scNCL not only realizes accurate and robust label transfer for common types, but also achieves reliable detection of novel types. scNCL is also computationally efficient and scalable to million-scale datasets. Moreover, we prove scNCL can help refine cell-type annotations in existing scATAC-seq atlases.
The source code and data used in this paper can be found in https://github.com/CSUBioGroup/scNCL-release.
scATAC-seq 使单细胞水平的染色质可及性图谱分析成为可能,为确定细胞类型特异性调控代码提供了机会。然而,scATAC-seq 数据的高维性、极度稀疏性和大规模给细胞类型识别带来了巨大挑战。因此,人们越来越有兴趣利用注释良好的 scRNA-seq 数据来帮助注释 scATAC-seq 数据。然而,从 scRNA-seq 到 scATAC-seq 转移信息仍然存在大量计算障碍,特别是对于它们的异质特征。
我们提出了一种新的迁移学习方法 scNCL,它利用先验知识和对比学习来解决异质特征的问题。简而言之,scNCL 根据先验知识将 scATAC-seq 特征转化为基因活性矩阵。由于特征转换可能会导致信息丢失,scNCL 引入了邻域对比学习来保持 scATAC-seq 细胞在原始特征空间中的邻域结构。为了学习可转移的潜在特征,scNCL 使用特征投影损失和对齐损失来协调 scRNA-seq 和 scATAC-seq 之间的嵌入。在各种数据集上的实验表明,scNCL 不仅可以实现常见类型的准确稳健标签转移,还可以可靠地检测新类型。scNCL 计算效率高,可扩展到百万规模的数据集。此外,我们证明 scNCL 可以帮助完善现有 scATAC-seq 图谱中的细胞类型注释。
本文中使用的源代码和数据可以在 https://github.com/CSUBioGroup/scNCL-release 找到。