Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA.
Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, CA 90095, USA.
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac722.
Genome-wide maps of epigenetic modifications are powerful resources for non-coding genome annotation. Maps of multiple epigenetics marks have been integrated into cell or tissue type-specific chromatin state annotations for many cell or tissue types. With the increasing availability of multiple chromatin state maps for biologically similar samples, there is a need for methods that can effectively summarize the information about chromatin state annotations within groups of samples and identify differences across groups of samples at a high resolution.
We developed CSREP, which takes as input chromatin state annotations for a group of samples. CSREP then probabilistically estimates the state at each genomic position and derives a representative chromatin state map for the group. CSREP uses an ensemble of multi-class logistic regression classifiers that predict the chromatin state assignment of each sample given the state maps from all other samples. The difference in CSREP's probability assignments for the two groups can be used to identify genomic locations with differential chromatin state assignments. Using groups of chromatin state maps of a diverse set of cell and tissue types, we demonstrate the advantages of using CSREP to summarize chromatin state maps and identify biologically relevant differences between groups at a high resolution.
The CSREP source code and generated data are available at http://github.com/ernstlab/csrep.
Supplementary data are available at Bioinformatics online.
全基因组表观遗传修饰图谱是对非编码基因组注释的强大资源。多种表观遗传标记图谱已经整合到许多细胞或组织类型的细胞或组织特异性染色质状态注释中。随着越来越多的生物学相似样本的多种染色质状态图谱的可用性,需要有一种方法可以有效地总结样本组内染色质状态注释的信息,并以高分辨率识别样本组之间的差异。
我们开发了 CSREP,它将一组样本的染色质状态注释作为输入。然后,CSREP 概率性地估计每个基因组位置的状态,并为该组派生一个代表性的染色质状态图谱。CSREP 使用一组多类逻辑回归分类器,这些分类器根据来自所有其他样本的状态图谱预测每个样本的染色质状态分配。可以使用 CSREP 对两组的概率分配之间的差异来识别具有差异染色质状态分配的基因组位置。使用一组不同细胞和组织类型的染色质状态图谱,我们展示了使用 CSREP 总结染色质状态图谱并以高分辨率识别组间生物学相关差异的优势。
CSREP 的源代码和生成的数据可在 http://github.com/ernstlab/csrep 上获得。
补充数据可在生物信息学在线获得。