Department of Cell Biology, Albert Einstein College of Medicine, Bronx, New York, United States of America.
Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, United States of America.
PLoS Comput Biol. 2021 Sep 7;17(9):e1009323. doi: 10.1371/journal.pcbi.1009323. eCollection 2021 Sep.
The B cells in our body generate protective antibodies by introducing somatic hypermutations (SHM) into the variable region of immunoglobulin genes (IgVs). The mutations are generated by activation induced deaminase (AID) that converts cytosine to uracil in single stranded DNA (ssDNA) generated during transcription. Attempts have been made to correlate SHM with ssDNA using bisulfite to chemically convert cytosines that are accessible in the intact chromatin of mutating B cells. These studies have been complicated by using different definitions of "bisulfite accessible regions" (BARs). Recently, deep-sequencing has provided much larger datasets of such regions but computational methods are needed to enable this analysis. Here we leveraged the deep-sequencing approach with unique molecular identifiers and developed a novel Hidden Markov Model based Bayesian Segmentation algorithm to characterize the ssDNA regions in the IGHV4-34 gene of the human Ramos B cell line. Combining hierarchical clustering and our new Bayesian model, we identified recurrent BARs in certain subregions of both top and bottom strands of this gene. Using this new system, the average size of BARs is about 15 bp. We also identified potential G-quadruplex DNA structures in this gene and found that the BARs co-locate with G-quadruplex structures in the opposite strand. Using various correlation analyses, there is not a direct site-to-site relationship between the bisulfite accessible ssDNA and all sites of SHM but most of the highly AID mutated sites are within 15 bp of a BAR. In summary, we developed a novel platform to study single stranded DNA in chromatin at a base pair resolution that reveals potential relationships among BARs, SHM and G-quadruplexes. This platform could be applied to genome wide studies in the future.
我们体内的 B 细胞通过在免疫球蛋白基因(IgV)的可变区引入体细胞超突变(SHM)来产生保护性抗体。这些突变是由激活诱导的脱氨酶(AID)产生的,它将胞嘧啶在转录过程中产生的单链 DNA(ssDNA)中转化为尿嘧啶。人们曾试图通过亚硫酸氢盐将 SHM 与 ssDNA 相关联,以化学方式将在突变 B 细胞的完整染色质中可及的胞嘧啶转化为胞嘧啶。这些研究受到了使用不同的“亚硫酸氢盐可及区域”(BAR)定义的影响。最近,深度测序为这些区域提供了更大的数据集,但需要计算方法来支持这种分析。在这里,我们利用深度测序方法和独特的分子标识符,开发了一种新的基于隐马尔可夫模型的贝叶斯分割算法,以描述人类 Ramos B 细胞系 IGHV4-34 基因中的 ssDNA 区域。通过层次聚类和我们新的贝叶斯模型,我们在该基因的顶部和底部链的某些亚区中识别出了重复的 BAR。使用这个新系统,BAR 的平均大小约为 15bp。我们还在该基因中鉴定出了潜在的 G-四链体 DNA 结构,并发现 BAR 与相反链上的 G-四链体结构共定位。通过各种相关分析,没有发现亚硫酸氢盐可及的 ssDNA 与所有 SHM 位点之间的直接点对点关系,但大多数高度 AID 突变的位点都在 BAR 的 15bp 范围内。总之,我们开发了一种新的平台,以碱基对分辨率研究染色质中的单链 DNA,揭示了 BAR、SHM 和 G-四链体之间的潜在关系。这个平台将来可以应用于全基因组研究。