Tian Yijun, McDonnell Shannon K, Wu Lang, Larson Nicholas B, Wang Liang
Department of Tumor Microenvironment and Metastasis, Moffitt Cancer Center, Tampa, FL 33612, United States.
Division of Clinical Trials and Biostatistics, Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55905, United States.
bioRxiv. 2024 Sep 28:2024.09.27.614715. doi: 10.1101/2024.09.27.614715.
5-methylcytosine (5mC) is the most common chemical modification occurring on the CpG sites across the human genome. Bisulfite conversion combined with short-read whole genome sequencing can capture and quantify the modification at single nucleotide resolution. However, the PCR amplification process could lead to duplicative methylation patterns and introduce 5mC detection bias. Additionally, the limited read length also restricts co-methylation analysis between distant CpG sites. The bisulfite conversion process presents a significant challenge for detecting variant-specific methylation due to the destruction of allele information in the sequencing reads. To address these issues, we sought to characterize the human methylation profiling with the nanopore long-read sequencing, aiming to demonstrate its potential for long-range co-methylation analysis with native modification call and intact allele information retained. In this regard, we first analyzed the nanopore demo data in the adaptive sampling sequencing run targeting all human CpG islands. We applied the linkage disequilibrium (LD) R to calculate the co-methylation in nanopore data, and further identified 27,875, 50,481, 26,542 and 51,189 methylation haplotype blocks (MHB) in COLO829, COLO829BL, HCC1395 and HCC1395BL cell lines, respectively. Interestingly, while we found that majority of the co-methylation were in a short range (≤200bp), a small portion (1~3%) showed long distance (≥1,000bp), suggesting potential remote regulatory mechanisms across the genome. To further characterize the epigenetic changes related to transcription factor binding, we profiled the 5mC percentage changes surrounding various motif sites in JASPAR collection and found that CTCF and KLF5 binding sites showed reduced methylation, while FOXE1 and ZNF354A sites showed increased methylation. To further investigate the allele-specific 5mCG in the prostate genome, we designed a target region covering methylation quantitative trait loci (mQTL) and genome-wide association study (GWAS) risk germline variants and generated long reads with adaptive sampling run in the 22Rv1 cell line. To identify the allele-specific methylation in the 22Rv1 cell line, we performed long-read based phasing and compared the 5mCG signals between the two haplotypes. As a result, we identified 6,390 haplotype-specific methylated regions in the 22Rv1 cell line (p-MWU ≤ 1e-5 and delta ≥ 50%). By examining haplotype-specific methylated regions near the phasing variants, we identified examples of allele-specific methylated regions that showed allelespecific accessibility in the ATAC-seq data. By further integrating the ATAC-seq data of 22Rv1, we found that methylation levels were negatively correlated with chromatin accessibility at the genome-wide scale. Our study has revealed native methylome profiling while preserving haplotype information, offering a novel approach to uncovering the regulatory mechanisms of the human prostate genome.
5-甲基胞嘧啶(5mC)是人类基因组中CpG位点上最常见的化学修饰。亚硫酸氢盐转化结合短读长全基因组测序能够以单核苷酸分辨率捕获并定量这种修饰。然而,PCR扩增过程可能导致重复的甲基化模式,并引入5mC检测偏差。此外,有限的读长也限制了对远距离CpG位点之间的共甲基化分析。由于测序读段中的等位基因信息被破坏,亚硫酸氢盐转化过程对于检测变异特异性甲基化提出了重大挑战。为了解决这些问题,我们试图利用纳米孔长读长测序来表征人类甲基化图谱,旨在证明其在保留天然修饰调用和完整等位基因信息的情况下进行长程共甲基化分析的潜力。在这方面,我们首先分析了针对所有人类CpG岛的适应性采样测序运行中的纳米孔演示数据。我们应用连锁不平衡(LD)R来计算纳米孔数据中的共甲基化,并分别在COLO829、COLO829BL、HCC1395和HCC1395BL细胞系中进一步鉴定出27,875、50,481、26,542和51,189个甲基化单倍型块(MHB)。有趣的是,虽然我们发现大多数共甲基化处于短距离(≤200bp),但一小部分(1~3%)显示出长距离(≥1,000bp),这表明全基因组存在潜在的远程调控机制。为了进一步表征与转录因子结合相关的表观遗传变化,我们分析了JASPAR数据集中各种基序位点周围的5mC百分比变化,发现CTCF和KLF5结合位点的甲基化降低,而FOXE1和ZNF354A位点的甲基化增加。为了进一步研究前列腺基因组中的等位基因特异性5mCG,我们设计了一个覆盖甲基化定量性状位点(mQTL)和全基因组关联研究(GWAS)风险种系变异的目标区域,并在22Rv1细胞系中通过适应性采样运行生成了长读段。为了鉴定22Rv1细胞系中的等位基因特异性甲基化,我们进行了基于长读段的定相,并比较了两个单倍型之间的5mCG信号。结果,我们在22Rv1细胞系中鉴定出6,390个单倍型特异性甲基化区域(p-MWU≤1e-5且δ≥50%)。通过检查定相变异附近的单倍型特异性甲基化区域,我们鉴定出了在ATAC-seq数据中显示出等位基因特异性可及性的等位基因特异性甲基化区域实例。通过进一步整合22Rv1的ATAC-seq数据,我们发现在全基因组范围内甲基化水平与染色质可及性呈负相关。我们的研究揭示了保留单倍型信息的天然甲基化组图谱,为揭示人类前列腺基因组的调控机制提供了一种新方法。