School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, Sichuan, China.
Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, Shenzhen 518110, Guangdong, China.
Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad533.
Single-cell DNA methylation sequencing can assay DNA methylation at single-cell resolution. However, incomplete coverage compromises related downstream analyses, outlining the importance of imputation techniques. With a rising number of cell samples in recent large datasets, scalable and efficient imputation models are critical to addressing the sparsity for genome-wide analyses.
We proposed a novel graph-based deep learning approach to impute methylation matrices based on locus-aware neighboring subgraphs with locus-aware encoding orienting on one cell type. Merely using the CpGs methylation matrix, the obtained GraphCpG outperforms previous methods on datasets containing more than hundreds of cells and achieves competitive performance on smaller datasets, with subgraphs of predicted sites visualized by retrievable bipartite graphs. Besides better imputation performance with increasing cell number, it significantly reduces computation time and demonstrates improvement in downstream analysis.
The source code is freely available at https://github.com/yuzhong-deng/graphcpg.git.
单细胞 DNA 甲基化测序可以在单细胞分辨率下检测 DNA 甲基化。然而,不完全覆盖会影响相关的下游分析,这就凸显了填补技术的重要性。随着最近大型数据集中文本数量的增加,可扩展且高效的填补模型对于解决全基因组分析的稀疏性至关重要。
我们提出了一种基于图的深度学习方法,该方法基于位置感知的相邻子图,并基于对一种细胞类型的位置感知编码进行填补甲基化矩阵。仅使用 CpG 甲基化矩阵,所获得的 GraphCpG 在包含数百个以上细胞的数据集上优于以前的方法,并在较小的数据集上实现了有竞争力的性能,可通过可检索的二分图可视化预测位点的子图。除了在细胞数量增加时具有更好的填补性能外,它还显著减少了计算时间,并在下游分析中得到了改进。
源代码可在 https://github.com/yuzhong-deng/graphcpg.git 上免费获取。