Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA.
Genome Res. 2013 Sep;23(9):1395-409. doi: 10.1101/gr.152454.112. Epub 2013 May 8.
We delineated and analyzed directly oriented paralogous low-copy repeats (DP-LCRs) in the most recent version of the human haploid reference genome. The computationally defined DP-LCRs were cross-referenced with our chromosomal microarray analysis (CMA) database of 25,144 patients subjected to genome-wide assays. This computationally guided approach to the empirically derived large data set allowed us to investigate genomic rearrangement relative frequencies and identify new loci for recurrent nonallelic homologous recombination (NAHR)-mediated copy-number variants (CNVs). The most commonly observed recurrent CNVs were NPHP1 duplications (233), CHRNA7 duplications (175), and 22q11.21 deletions (DiGeorge/velocardiofacial syndrome, 166). In the ∼25% of CMA cases for which parental studies were available, we identified 190 de novo recurrent CNVs. In this group, the most frequently observed events were deletions of 22q11.21 (48), 16p11.2 (autism, 34), and 7q11.23 (Williams-Beuren syndrome, 11). Several features of DP-LCRs, including length, distance between NAHR substrate elements, DNA sequence identity (fraction matching), GC content, and concentration of the homologous recombination (HR) hot spot motif 5'-CCNCCNTNNCCNC-3', correlate with the frequencies of the recurrent CNVs events. Four novel adjacent DP-LCR-flanked and NAHR-prone regions, involving 2q12.2q13, were elucidated in association with novel genomic disorders. Our study quantitates genome architectural features responsible for NAHR-mediated genomic instability and further elucidates the role of NAHR in human disease.
我们在人类单倍体参考基因组的最新版本中描绘和分析了直接定向的直系同源低拷贝重复序列(DP-LCR)。计算定义的 DP-LCR 与我们的 25144 名接受全基因组检测的患者的染色体微阵列分析(CMA)数据库交叉引用。这种针对经验衍生大数据集的计算指导方法使我们能够研究基因组重排的相对频率,并确定新的非等位基因同源重组(NAHR)介导的拷贝数变异(CNV)的位置。最常见的观察到的复发性 CNV 是 NPHP1 重复(233),CHRNA7 重复(175)和 22q11.21 缺失(DiGeorge/心脏面部综合征,166)。在 CMA 案例中,约有 25%的案例可以获得父母研究结果,我们确定了 190 个新的复发性 CNV。在这一组中,最常见的事件是 22q11.21 缺失(48),16p11.2(自闭症,34)和 7q11.23(Williams-Beuren 综合征,11)。 DP-LCR 的几个特征,包括长度,NAHR 底物元件之间的距离,DNA 序列同一性(匹配分数),GC 含量和同源重组(HR)热点模体 5'-CCNCCNTNNCCNC-3'的浓度,与复发性 CNV 事件的频率相关。在与新的基因组疾病相关联的情况下,阐明了涉及 2q12.2q13 的四个新的相邻 DP-LCR 侧翼和易发生 NAHR 的区域。我们的研究定量了负责 NAHR 介导的基因组不稳定性的基因组结构特征,并进一步阐明了 NAHR 在人类疾病中的作用。