Shokoohi Farhad, Khaniki Saeedeh Hajebi
Department of Mathematical Sciences, University of Nevada-Las Vegas, Las Vega, NV 89154, USA.
Department of Biostatistics, School of Health, Mashhad University of Medical Sciences, Mashhad, Iran.
bioRxiv. 2023 Jun 15:2023.06.15.545168. doi: 10.1101/2023.06.15.545168.
Epigenetic alterations are key drivers in the development and progression of cancer. Identifying differentially methylated cytosines (DMCs) in cancer samples is a crucial step toward understanding these changes. In this paper, we propose a trans-dimensional Markov chain Monte Carlo (TMCMC) approach that uses hidden Markov models (HMMs) with binomial emission, and bisulfite sequencing (BS-Seq) data, called DMCTHM, to identify DMCs in cancer epigenetic studies. We introduce the Expander-Collider penalty to tackle under and over-estimation in TMCMC-HMMs. We address all known challenges inherent in BS-Seq data by introducing novel approaches for capturing functional patterns and autocorrelation structure of the data, as well as for handling missing values, multiple covariates, multiple comparisons, and family-wise errors. We demonstrate the effectiveness of DMCTHM through comprehensive simulation studies. The results show that our proposed method outperforms other competing methods in identifying DMCs. Notably, with DMCTHM, we uncovered new DMCs and genes in Colorectal cancer that were significantly enriched in the Tp53 pathway.
表观遗传改变是癌症发生和发展的关键驱动因素。识别癌症样本中差异甲基化的胞嘧啶(DMC)是理解这些变化的关键一步。在本文中,我们提出了一种跨维度马尔可夫链蒙特卡罗(TMCMC)方法,该方法使用具有二项式发射的隐马尔可夫模型(HMM)和亚硫酸氢盐测序(BS-Seq)数据(称为DMCTHM)来识别癌症表观遗传研究中的DMC。我们引入扩展器-碰撞器惩罚来解决TMCMC-HMM中的估计不足和过度估计问题。我们通过引入新颖的方法来捕捉数据的功能模式和自相关结构,以及处理缺失值、多个协变量、多重比较和家族性错误,来解决BS-Seq数据中固有的所有已知挑战。我们通过全面的模拟研究证明了DMCTHM的有效性。结果表明,我们提出的方法在识别DMC方面优于其他竞争方法。值得注意的是,通过DMCTHM,我们在结直肠癌中发现了新的DMC和基因,这些基因在Tp53通路中显著富集。