Quan Yuan, Liang Fengji, Deng Si-Min, Zhu Yuexing, Chen Ying, Xiong Jianghui
Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China.
Lab of Epigenetics and Advanced Health Technology, Space Science and Technology Institute, Shenzhen, China.
Front Mol Biosci. 2021 Mar 26;8:597513. doi: 10.3389/fmolb.2021.597513. eCollection 2021.
Epigenetics is an essential biological frontier linking genetics to the environment, where DNA methylation is one of the most studied epigenetic events. In recent years, through the epigenome-wide association study (EWAS), researchers have identified thousands of phenotype-related methylation sites. However, the overlaps of identified phenotype-related DNA methylation sites between various studies are often quite small, and it might be due to the fact that methylation remodeling has a certain degree of randomness within the genome. Thus, the identification of robust gene-phenotype associations is crucial to interpreting pathogenesis. How to integrate the methylation values of different sites on the same gene and to mine the DNA methylation at the gene level remains a challenge. A recent study found that the DNA methylation difference of the gene body and promoter region has a strong correlation with gene expression. In this study, we proposed a Statistical difference of DNA Methylation between Promoter and Other Body Region (SIMPO) algorithm to extract DNA methylation values at the gene level. First, by choosing to smoke as an environmental exposure factor, our method led to significant improvements in gene overlaps (from 5 to 17%) between different datasets. In addition, the biological significance of phenotype-related genes identified by SIMPO algorithm is comparable to that of the traditional probe-based methods. Then, we selected two disease contents (e.g., insulin resistance and Parkinson's disease) to show that the biological efficiency of disease-related gene identification increased from 15.43 to 44.44% (-value = 1.20e-28). In summary, our results declare that mining the selective remodeling of DNA methylation in promoter regions can identify robust gene-level associations with phenotype, and the characteristic remodeling of a given gene's promoter region can reflect the essence of disease.
表观遗传学是连接遗传学与环境的重要生物学前沿领域,其中DNA甲基化是研究最多的表观遗传事件之一。近年来,通过全表观基因组关联研究(EWAS),研究人员已经鉴定出数千个与表型相关的甲基化位点。然而,不同研究中鉴定出的与表型相关的DNA甲基化位点之间的重叠往往很小,这可能是由于甲基化重塑在基因组内具有一定程度的随机性。因此,确定稳健的基因-表型关联对于解释发病机制至关重要。如何整合同一基因上不同位点的甲基化值并在基因水平挖掘DNA甲基化仍然是一个挑战。最近的一项研究发现,基因体和启动子区域的DNA甲基化差异与基因表达密切相关。在本研究中,我们提出了一种启动子与其他基因体区域之间DNA甲基化的统计差异(SIMPO)算法,以提取基因水平的DNA甲基化值。首先,通过选择吸烟作为环境暴露因素,我们的方法使不同数据集之间的基因重叠率有了显著提高(从5%提高到17%)。此外,SIMPO算法鉴定出的与表型相关基因的生物学意义与传统的基于探针的方法相当。然后,我们选择了两种疾病内容(例如胰岛素抵抗和帕金森病)来表明,疾病相关基因鉴定的生物学效率从15.43%提高到了44.44%(P值 = 1.20e - 28)。总之,我们的结果表明,挖掘启动子区域DNA甲基化的选择性重塑可以确定与表型的稳健基因水平关联,给定基因启动子区域的特征性重塑可以反映疾病的本质。