Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, USA.
Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA, USA.
Commun Biol. 2024 Jan 2;7(1):1. doi: 10.1038/s42003-023-05690-5.
The proliferation of single-cell RNA-sequencing data has led to the widespread use of cellular deconvolution, aiding the extraction of cell-type-specific information from extensive bulk data. However, those advances have been mostly limited to transcriptomic data. With recent developments in single-cell DNA methylation (scDNAm), there are emerging opportunities for deconvolving bulk DNAm data, particularly for solid tissues like brain that lack cell-type references. Due to technical limitations, current scDNAm sequences represent a small proportion of the whole genome for each single cell, and those detected regions differ across cells. This makes scDNAm data ultra-high dimensional and ultra-sparse. To deal with these challenges, we introduce scMD (single cell Methylation Deconvolution), a cellular deconvolution framework to reliably estimate cell type fractions from tissue-level DNAm data. To analyze large-scale complex scDNAm data, scMD employs a statistical approach to aggregate scDNAm data at the cell cluster level, identify cell-type marker DNAm sites, and create precise cell-type signature matrixes that surpass state-of-the-art sorted-cell or RNA-derived references. Through thorough benchmarking in several datasets, we demonstrate scMD's superior performance in estimating cellular fractions from bulk DNAm data. With scMD-estimated cellular fractions, we identify cell type fractions and cell type-specific differentially methylated cytosines associated with Alzheimer's disease.
单细胞 RNA 测序数据的激增导致细胞去卷积技术得到广泛应用,有助于从大量的批量数据中提取细胞类型特异性信息。然而,这些进展主要局限于转录组数据。随着单细胞 DNA 甲基化 (scDNAm) 的最新发展,从批量 DNAm 数据中进行去卷积的机会正在出现,特别是对于大脑等缺乏细胞类型参考的实体组织。由于技术限制,目前每个单细胞的 scDNAm 序列仅代表整个基因组的一小部分,并且这些检测到的区域在细胞之间存在差异。这使得 scDNAm 数据具有超高维性和超稀疏性。为了应对这些挑战,我们引入了 scMD(单细胞甲基化去卷积),这是一种细胞去卷积框架,可从组织水平的 DNAm 数据中可靠地估计细胞类型分数。为了分析大规模复杂的 scDNAm 数据,scMD 在细胞簇水平上采用统计方法聚合 scDNAm 数据,识别细胞类型标记 DNAm 位点,并创建超越最新排序细胞或 RNA 衍生参考的精确细胞类型特征矩阵。通过在几个数据集上进行全面的基准测试,我们证明了 scMD 在从批量 DNAm 数据中估计细胞分数方面的卓越性能。使用 scMD 估计的细胞分数,我们确定了与阿尔茨海默病相关的细胞类型分数和细胞类型特异性差异甲基化胞嘧啶。