Suppr超能文献

CaMelia:基于细胞间局部相似性的单细胞甲基化组插补

CaMelia: imputation in single-cell methylomes based on local similarities between cells.

作者信息

Tang Jianxiong, Zou Jianxiao, Fan Mei, Tian Qi, Zhang Jiyang, Fan Shicai

机构信息

Department of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China.

Chengdu Women's and Children's Central Hospital, School of Medicine, University of Electronic Science and Technology of China, Chengdu 611731, China.

出版信息

Bioinformatics. 2021 Jul 27;37(13):1814-1820. doi: 10.1093/bioinformatics/btab029.

Abstract

MOTIVATION

Single-cell DNA methylation sequencing detects methylation levels with single-cell resolution, while this technology is upgrading our understanding of the regulation of gene expression through epigenetic modifications. Meanwhile, almost all current technologies suffer from the inherent problem of detecting low coverage of the number of CpGs. Therefore, addressing the inherent sparsity of raw data is essential for quantitative analysis of the whole genome.

RESULTS

Here, we reported CaMelia, a CatBoost gradient boosting method for predicting the missing methylation states based on the locally paired similarity of intercellular methylation patterns. On real single-cell methylation datasets, CaMelia yielded significant imputation performance gains over previous methods. Furthermore, applying the imputed data to the downstream analysis of cell-type identification, we found that CaMelia helped to discover more intercellular differentially methylated loci that were masked by the sparsity in raw data, and the clustering results demonstrated that CaMelia could preserve cell-cell relationships and improve the identification of cell types and cell subpopulations.

AVAILABILITY AND IMPLEMENTATION

Python code is available at https://github.com/JxTang-bioinformatics/CaMelia.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

单细胞DNA甲基化测序能够以单细胞分辨率检测甲基化水平,同时这项技术正在提升我们对通过表观遗传修饰调控基因表达的理解。与此同时,几乎所有当前技术都存在检测到的CpG数量覆盖度低这一固有问题。因此,解决原始数据的固有稀疏性对于全基因组的定量分析至关重要。

结果

在此,我们报告了CaMelia,一种基于细胞间甲基化模式的局部配对相似性来预测缺失甲基化状态的CatBoost梯度提升方法。在真实的单细胞甲基化数据集上,CaMelia比之前的方法在插补性能上有显著提升。此外,将插补后的数据应用于细胞类型识别的下游分析,我们发现CaMelia有助于发现更多被原始数据稀疏性掩盖的细胞间差异甲基化位点,并且聚类结果表明CaMelia能够保留细胞间关系并改善细胞类型和细胞亚群的识别。

可用性与实现

Python代码可在https://github.com/JxTang-bioinformatics/CaMelia获取。

补充信息

补充数据可在《生物信息学》在线获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验