Tang Jianxiong, Zou Jianxiao, Fan Mei, Tian Qi, Zhang Jiyang, Fan Shicai
Department of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China.
Chengdu Women's and Children's Central Hospital, School of Medicine, University of Electronic Science and Technology of China, Chengdu 611731, China.
Bioinformatics. 2021 Jul 27;37(13):1814-1820. doi: 10.1093/bioinformatics/btab029.
Single-cell DNA methylation sequencing detects methylation levels with single-cell resolution, while this technology is upgrading our understanding of the regulation of gene expression through epigenetic modifications. Meanwhile, almost all current technologies suffer from the inherent problem of detecting low coverage of the number of CpGs. Therefore, addressing the inherent sparsity of raw data is essential for quantitative analysis of the whole genome.
Here, we reported CaMelia, a CatBoost gradient boosting method for predicting the missing methylation states based on the locally paired similarity of intercellular methylation patterns. On real single-cell methylation datasets, CaMelia yielded significant imputation performance gains over previous methods. Furthermore, applying the imputed data to the downstream analysis of cell-type identification, we found that CaMelia helped to discover more intercellular differentially methylated loci that were masked by the sparsity in raw data, and the clustering results demonstrated that CaMelia could preserve cell-cell relationships and improve the identification of cell types and cell subpopulations.
Python code is available at https://github.com/JxTang-bioinformatics/CaMelia.
Supplementary data are available at Bioinformatics online.
单细胞DNA甲基化测序能够以单细胞分辨率检测甲基化水平,同时这项技术正在提升我们对通过表观遗传修饰调控基因表达的理解。与此同时,几乎所有当前技术都存在检测到的CpG数量覆盖度低这一固有问题。因此,解决原始数据的固有稀疏性对于全基因组的定量分析至关重要。
在此,我们报告了CaMelia,一种基于细胞间甲基化模式的局部配对相似性来预测缺失甲基化状态的CatBoost梯度提升方法。在真实的单细胞甲基化数据集上,CaMelia比之前的方法在插补性能上有显著提升。此外,将插补后的数据应用于细胞类型识别的下游分析,我们发现CaMelia有助于发现更多被原始数据稀疏性掩盖的细胞间差异甲基化位点,并且聚类结果表明CaMelia能够保留细胞间关系并改善细胞类型和细胞亚群的识别。
Python代码可在https://github.com/JxTang-bioinformatics/CaMelia获取。
补充数据可在《生物信息学》在线获取。