School of Science, East China University of Technology, Nanchang, Jiangxi 330013, China.
Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA.
Bioinformatics. 2021 May 23;37(8):1052-1059. doi: 10.1093/bioinformatics/btaa930.
It is a common practice in epigenetics research to profile DNA methylation on tissue samples, which is usually a mixture of different cell types. To properly account for the mixture, estimating cell compositions has been recognized as an important first step. Many methods were developed for quantifying cell compositions from DNA methylation data, but they mostly have limited applications due to lack of reference or prior information.
We develop Tsisal, a novel complete deconvolution method which accurately estimate cell compositions from DNA methylation data without any prior knowledge of cell types or their proportions. Tsisal is a full pipeline to estimate number of cell types, cell compositions and identify cell-type-specific CpG sites. It can also assign cell type labels when (full or part of) reference panel is available. Extensive simulation studies and analyses of seven real datasets demonstrate the favorable performance of our proposed method compared with existing deconvolution methods serving similar purpose.
The proposed method Tsisal is implemented as part of the R/Bioconductor package TOAST at https://bioconductor.org/packages/TOAST.
Supplementary data are available at Bioinformatics online.
在表观遗传学研究中,对组织样本中的 DNA 甲基化进行分析是一种常见做法,而这些组织样本通常是不同细胞类型的混合物。为了正确解释这种混合物,估计细胞组成被认为是重要的第一步。已经开发出许多从 DNA 甲基化数据中定量细胞组成的方法,但由于缺乏参考或先验信息,它们大多应用有限。
我们开发了 Tsisal,这是一种新颖的完整去卷积方法,它可以在没有任何关于细胞类型或其比例的先验知识的情况下,从 DNA 甲基化数据中准确估计细胞组成。Tsisal 是一个完整的管道,用于估计细胞类型的数量、细胞组成和识别细胞类型特异性 CpG 位点。当有(全部或部分)参考面板时,它还可以分配细胞类型标签。对七个真实数据集的广泛模拟研究和分析表明,与服务于类似目的的现有去卷积方法相比,我们提出的方法具有更好的性能。
所提出的方法 Tsisal 作为 R/Bioconductor 包 TOAST 的一部分实现,可在 https://bioconductor.org/packages/TOAST 上获取。
补充数据可在 Bioinformatics 在线获取。