Eltager Mostafa, Abdelaal Tamim, Mahfouz Ahmed, Reinders Marcel J T
Delft Bioinformatics Lab, Delft University of Technology, Delft 2628XE, The Netherlands.
Leiden Computational Biology Center, Leiden University Medical Center, Leiden 2333ZC, The Netherlands.
Bioinform Adv. 2022 Feb 15;2(1):vbac011. doi: 10.1093/bioadv/vbac011. eCollection 2022.
Single-cell multi-omics assays simultaneously measure different molecular features from the same cell. A key question is how to benefit from the complementary data available and perform cross-modal clustering of cells.
We propose ingle-ell ulti-mics lustering (scMoC), an approach to identify cell clusters from data with comeasurements of scRNA-seq and scATAC-seq from the same cell. We overcome the high sparsity of the scATAC-seq data by using an imputation strategy that exploits the less-sparse scRNA-seq data available from the same cell. Subsequently, scMoC identifies clusters of cells by merging clusterings derived from both data domains individually. We tested scMoC on datasets generated using different protocols with variable data sparsity levels. We show that scMoC (i) is able to generate informative scATAC-seq data due to its RNA-guided imputation strategy and (ii) results in integrated clusters based on both RNA and ATAC information that are biologically meaningful either from the RNA or from the ATAC perspective.
The data used in this manuscript is publicly available, and we refer to the original manuscript for their description and availability. For convience sci-CAR data is available at NCBI GEO under the accession number of GSE117089. SNARE-seq data is available at NCBI GEO under the accession number of GSE126074. The 10X multiome data is available at the following link https://www.10xgenomics.com/resources/datasets/pbmc-from-a-healthy-donor-no-cell-sorting-3-k-1-standard-2-0-0.
Supplementary data are available at online.
单细胞多组学分析可同时测量来自同一细胞的不同分子特征。一个关键问题是如何利用可用的互补数据并对细胞进行跨模态聚类。
我们提出了单细胞多组学聚类(scMoC),这是一种从同一细胞的scRNA-seq和scATAC-seq测量数据中识别细胞簇的方法。我们通过使用一种插补策略克服了scATAC-seq数据的高度稀疏性,该策略利用了来自同一细胞的稀疏性较低的scRNA-seq数据。随后,scMoC通过合并分别从两个数据域得出的聚类来识别细胞簇。我们在使用不同协议生成的具有可变数据稀疏水平的数据集上测试了scMoC。我们表明,scMoC(i)由于其RNA引导的插补策略能够生成信息丰富的scATAC-seq数据,并且(ii)基于RNA和ATAC信息产生综合聚类,这些聚类从RNA或ATAC角度来看都具有生物学意义。
本手稿中使用的数据可公开获取,我们参考原始手稿获取其描述和可用性信息。为方便起见,sci-CAR数据可在NCBI GEO上获取,登录号为GSE117089。SNARE-seq数据可在NCBI GEO上获取,登录号为GSE126074。10X多组学数据可通过以下链接获取:https://www.10xgenomics.com/resources/datasets/pbmc-from-a-healthy-donor-no-cell-sorting-3-k-1-standard-2-0-0。
补充数据可在网上获取。