Porubsky David, Yoo DongAhn, Dishuck Philip C, Koundinya Nidhi, Souche Erika, Harvey William T, Munson Katherine M, Hoekzema Kendra, Chan Daniel D, Leung Tiffany Y, Santos Marta S, Meynants Senne, Swillen Ann, Breckpot Jeroen, Tsapalou Vasiliki, Hasenfeld Patrick, Korbel Jan O, Lansdorp Peter M, Vermeesch Joris R, Eichler Evan E
Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany.
bioRxiv. 2025 Jul 7:2025.07.04.662981. doi: 10.1101/2025.07.04.662981.
The most common genomic disorder, chromosome 22q11.2 microdeletion syndrome (22q11.2DS), is mediated by highly identical and polymorphic segmental duplications (SDs) known as low copy repeats (LCRs; regions A-D) that have been challenging to sequence and characterize. Here, we report the sequence-resolved genomic architecture of 135 chromosome 22q11.2 haplotypes from diverse 1000 Genomes Project samples. We find that more than 90% of the copy number variation is polarized to the most proximal LCR region A (LCRA) where 50 distinct structural configurations are observed (~189 kbp to ~2.15 Mbp or 11-fold length variation). A higher-order SD cassette structure of 105 kbp in length, flanked by 25 kbp long inverted repeats, drives this variation and emerged in the human-chimpanzee ancestral lineage later expanding in humans ~1.0 [0.8-1.2] million years ago. African LCRA haplotypes are significantly longer (p=0.0047) when compared to non-Africans yet are predicted to be more protected against recurrent microdeletions (p=0.00053) due to a preponderance of flanking SDs in an inverted orientation. Conversely, we identified nine distinct inversion polymorphisms, including five recurrent ~2.28 Mbp inversions extending across the critical region (LCRA-D) and four smaller inversions (two LCRA-B, one LCRC-D, and one LCRB-D); 7/9 of these events were identified in haplotypes of African and admixed American ancestry. Finally, we sequence and assemble four families and show that LCRA-D deletion breakpoints map to the 105 kbp repeat unit while inversion breakpoints associate with the 25 kbp repeats adjacent to palindromic AT-rich regions. In one family, we observe evidence of more complex unequal crossover events associated with gene conversion and multiple breakpoints. Our findings suggest that specific haplotype configurations are protective and susceptible to chromosome 22q11.2DS while recurrent large-scale inversions help to explain why this syndrome is less prevalent among individuals of African descent.
最常见的基因组疾病——22号染色体q11.2微缺失综合征(22q11.2DS),是由高度同源且多态的节段性重复序列(SDs)介导的,这些重复序列被称为低拷贝重复序列(LCRs;A - D区域),其测序和特征分析颇具挑战性。在此,我们报告了来自千人基因组计划不同样本的135个22号染色体q11.2单倍型的序列解析基因组结构。我们发现,超过90%的拷贝数变异集中在最靠近端的LCR区域A(LCRA),在该区域观察到50种不同的结构构型(189 kbp至2.15 Mbp或11倍的长度变异)。一个长度为105 kbp的高阶SD盒结构,两侧是25 kbp长的反向重复序列,驱动了这种变异,它在人类 - 黑猩猩祖先谱系中出现,约在100万[80 - 120万]年前在人类中扩展。与非非洲人相比,非洲人的LCRA单倍型显著更长(p = 0.0047),但由于反向排列的侧翼SDs占优势,预计其对反复微缺失的保护作用更强(p = 0.00053)。相反,我们鉴定出9种不同的倒位多态性,包括5种跨越关键区域(LCRA - D)的反复出现的~2.28 Mbp倒位和4种较小的倒位(2种LCRA - B、1种LCRC - D和1种LCRB - D);其中7/9的事件在非洲和美洲混血祖先的单倍型中被鉴定出来。最后,我们对四个家系进行了测序和组装,并表明LCRA - D缺失断点映射到105 kbp重复单元,而倒位断点与富含AT的回文区域相邻的25 kbp重复序列相关。在一个家系中,我们观察到与基因转换和多个断点相关联的更复杂的不等交换事件存在证据。我们的研究结果表明,特定的单倍型构型对22q11.2DS具有保护作用且易感性不同,而反复出现的大规模倒位有助于解释为什么这种综合征在非洲裔个体中不太常见。