Meng Nan, Machiraju Raghu, Huang Kun
Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, 43210, USA.
Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA.
BMC Bioinformatics. 2016 Dec 23;17(Suppl 17):534. doi: 10.1186/s12859-016-1346-5.
Identification and analysis of recurrent combinatorial patterns of multiple chromatin modifications provide invaluable information for understanding epigenetic regulations. Furthermore, as more data becomes available, it is computationally expensive and unnecessary to study combinatorial patterns of all modifications.
A novel framework is proposed to investigate recurrent combinatorial patterns of a subset of quantitatively selected chromatin modifications. The framework is based on heirarchical clustering and selects subsets of chromatin modifications that form distinct recurrent patterns at regulatory regions. The identified recurrent combinatorial patterns can be further utilized to discover novel regulatory regions. Data is in the form of genome wide maps of histone acetylations, methylations, and histone variant of human skeletal muscular and B-lymphocyte cells both derived from the ENCODE project.
A case study conducted at promoter regions is presented: four out of twelve chromatin modifications were selected, eight different promoter states were identified and the identified patterns of active promoters were further utilized to discover novel promoter regions. Several previously un-annotated promoters were discovered, further investigations confirm their promoter functions.
This framework is approproiately general and could lead to better understanding of epigenetic regulations by discovering previously unknown regulatory regions.
对多种染色质修饰的反复出现的组合模式进行识别和分析,为理解表观遗传调控提供了宝贵信息。此外,随着更多数据的可得,研究所有修饰的组合模式在计算上既昂贵又无必要。
提出了一种新颖的框架来研究定量选择的染色质修饰子集的反复出现的组合模式。该框架基于层次聚类,选择在调控区域形成不同反复出现模式的染色质修饰子集。所识别出的反复出现的组合模式可进一步用于发现新的调控区域。数据采用来自ENCODE项目的人类骨骼肌和B淋巴细胞的组蛋白乙酰化、甲基化及组蛋白变体的全基因组图谱形式。
展示了在启动子区域进行的一个案例研究:从十二种染色质修饰中选择了四种,识别出八种不同的启动子状态,并利用所识别出的活性启动子模式发现了新的启动子区域。发现了几个先前未注释的启动子,进一步研究证实了它们的启动子功能。
该框架具有适当的通用性,通过发现先前未知的调控区域,可能会更好地理解表观遗传调控。