Gu Chiyu, Baladandayuthapani Veerabhadran, Guha Subharup
Formerly at the University of Missouri. Currently employed at Bayer Crop Science, 700 Chesterfield Pkwy W, Chesterfield, MO 63017.
Department of Biostatistics, University of Michigan.
Bayesian Anal. 2025 Jun;20(2):489-518. doi: 10.1214/23-ba1407. Epub 2023 Nov 23.
DNA methylation datasets in cancer studies are comprised of measurements on a large number of genomic locations called cytosine-phosphate-guanine (CpG) sites with complex correlation structures. A fundamental goal of these studies is the development of statistical techniques that can identify disease genomic signatures across multiple patient groups defined by different experimental or biological conditions. We propose , a nonparametric Bayesian approach for differential analysis relying on a novel class of first order mixture models called the Sticky Pitman-Yor process or two-restaurant two-cuisine franchise (2R2CF). The BayesDiff methodology flexibly utilizes information from all CpG sites or biomarker probes, adaptively accommodates any serial dependence due to the widely varying inter-probe distances, and makes posterior inferences about the differential genomic signature of patient groups. Using simulation studies, we demonstrate the effectiveness of the BayesDiff procedure relative to existing statistical techniques for differential DNA methylation. The methodology is applied to analyze a gastrointestinal (GI) cancer dataset exhibiting serial correlation and complex interaction patterns. The results support and complement known aspects of DNA methylation and gene association in upper GI cancers.
癌症研究中的DNA甲基化数据集由对大量称为胞嘧啶-磷酸-鸟嘌呤(CpG)位点的基因组位置的测量组成,这些位点具有复杂的相关结构。这些研究的一个基本目标是开发统计技术,以识别由不同实验或生物学条件定义的多个患者群体中的疾病基因组特征。我们提出了一种非参数贝叶斯方法用于差异分析,该方法依赖于一类称为粘性皮特曼-约尔过程或两餐厅两菜系特许经营(2R2CF)的新型一阶混合模型。贝叶斯差异分析方法(BayesDiff)灵活地利用来自所有CpG位点或生物标志物探针的信息,自适应地适应由于探针间距离差异很大而产生的任何序列依赖性,并对患者群体的差异基因组特征进行后验推断。通过模拟研究,我们证明了BayesDiff程序相对于现有的差异DNA甲基化统计技术的有效性。该方法被应用于分析一个表现出序列相关性和复杂相互作用模式的胃肠道(GI)癌症数据集。结果支持并补充了上消化道癌症中DNA甲基化和基因关联的已知方面。