Department of Biomedical Informatics, Stony Brook University School of Medicine, Stony Brook, NY 11794, USA.
Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22903, USA.
Bioinformatics. 2021 Dec 22;38(1):22-29. doi: 10.1093/bioinformatics/btab637.
Conservation is broadly used to identify biologically important (epi)genomic regions. In the case of tumor growth, preferential conservation of DNA methylation can be used to identify areas of particular functional importance to the tumor. However, reliable assessment of methylation conservation based on multiple tissue samples per patient requires the decomposition of methylation variation at multiple levels.
We developed a Bayesian hierarchical model that allows for variance decomposition of methylation on three levels: between-patient normal tissue variation, between-patient tumor-effect variation and within-patient tumor variation. We then defined a model-based conservation score to identify loci of reduced within-tumor methylation variation relative to between-patient variation. We fit the model to multi-sample methylation array data from 21 colorectal cancer (CRC) patients using a Monte Carlo Markov Chain algorithm (Stan). Sets of genes implicated in CRC tumorigenesis exhibited preferential conservation, demonstrating the model's ability to identify functionally relevant genes based on methylation conservation. A pathway analysis of preferentially conserved genes implicated several CRC relevant pathways and pathways related to neoantigen presentation and immune evasion. Our findings suggest that preferential methylation conservation may be used to identify novel gene targets that are not consistently mutated in CRC. The flexible structure makes the model amenable to the analysis of more complex multi-sample data structures.
The data underlying this article are available in the NCBI GEO Database, under accession code GSE166212. The R analysis code is available at https://github.com/kevin-murgas/DNAmethylation-hierarchicalmodel.
Supplementary data are available at Bioinformatics online.
保护被广泛用于识别具有生物学重要性的( epi )基因组区域。在肿瘤生长的情况下,可以使用 DNA 甲基化的优先保护来识别对肿瘤具有特殊功能重要性的区域。然而,基于每个患者的多个组织样本对甲基化保护进行可靠评估需要在多个层次上分解甲基化变异。
我们开发了一个贝叶斯层次模型,该模型允许在三个层次上对甲基化进行方差分解:患者间正常组织变异、患者间肿瘤效应变异和患者内肿瘤变异。然后,我们定义了一个基于模型的保护评分,以识别与患者间变异相比,肿瘤内甲基化变异减少的位点。我们使用蒙特卡罗马尔可夫链算法(Stan)拟合了来自 21 名结直肠癌(CRC)患者的多样本甲基化阵列数据的模型。涉及 CRC 肿瘤发生的基因集表现出优先保护,证明了该模型基于甲基化保护识别功能相关基因的能力。优先保守基因的途径分析涉及几个 CRC 相关途径以及与新抗原呈递和免疫逃逸相关的途径。我们的研究结果表明,优先的甲基化保护可用于识别在 CRC 中不一致突变的新基因靶标。灵活的结构使该模型易于分析更复杂的多样本数据结构。
本文所依据的数据可在 NCBI GEO 数据库中获得,登录号为 GSE166212。R 分析代码可在 https://github.com/kevin-murgas/DNAmethylation-hierarchicalmodel 上获得。
补充数据可在生物信息学在线获得。