Murgas Leandro, Pollastri Gianluca, Riquelme Erick, Sáez Mauricio, Martin Alberto J M
Programa de Doctorado en Genómica Integrativa, Vicerrectoría de investigación, Universidad Mayor, Camino La Pirámide 5750, 8580745 Huechuraba, Chile.
Laboratorio de Redes Biológicas, Centro Científico y Tecnológico de Excelencia Ciencia & Vida, Fundación Ciencia & Vida, Avda. del Valle 725, 8580702 Huechuraba, Chile.
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae638.
Structural changes of chromatin modulate access to DNA for the molecular machinery involved in the control of transcription. These changes are linked to variations in epigenetic marks that allow to classify chromatin in different functional states depending on the pattern of these histone marks. Importantly, alterations in chromatin states are known to be linked with various diseases, and their changes are known to explain processes such as cellular proliferation. For most of the available samples, there are not enough epigenomic data available to accurately determine chromatin states for the cells affected in each of them. This is mainly due to high costs of performing this type of experiments but also because of lack of a sufficient amount of sample or its degradation. In this work, we describe a cascade method based on a random forest algorithm to infer epigenetic marks, and by doing so, to identify relationships between different histone marks. Importantly, our approach also reduces the number of experimentally determined marks required to assign chromatin states. Moreover, in this work we have identified several relationships between patterns of different histone marks, which strengthens the evidence in favor of a redundant epigenetic code.
染色质的结构变化调节了参与转录控制的分子机制对DNA的访问。这些变化与表观遗传标记的变化相关联,这些表观遗传标记允许根据这些组蛋白标记的模式将染色质分类为不同的功能状态。重要的是,已知染色质状态的改变与各种疾病有关,并且已知它们的变化可以解释细胞增殖等过程。对于大多数可用样本,没有足够的表观基因组数据来准确确定其中每个受影响细胞的染色质状态。这主要是由于进行此类实验的成本高昂,也是因为缺乏足够数量的样本或样本降解。在这项工作中,我们描述了一种基于随机森林算法的级联方法来推断表观遗传标记,并通过这样做来识别不同组蛋白标记之间的关系。重要的是,我们的方法还减少了分配染色质状态所需的实验确定标记的数量。此外,在这项工作中,我们已经确定了不同组蛋白标记模式之间的几种关系,这加强了支持冗余表观遗传密码的证据。