Weissensteiner Matthias H, Pang Andy W C, Bunikis Ignas, Höijer Ida, Vinnere-Petterson Olga, Suh Alexander, Wolf Jochen B W
Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, SE-752 36 Uppsala, Sweden.
Division of Evolutionary Biology, Faculty of Biology, Ludwig-Maximilian University of Munich, 82152 Planegg-Martinsried, Germany.
Genome Res. 2017 May;27(5):697-708. doi: 10.1101/gr.215095.116. Epub 2017 Mar 30.
Accurate and contiguous genome assembly is key to a comprehensive understanding of the processes shaping genomic diversity and evolution. Yet, it is frequently constrained by constitutive heterochromatin, usually characterized by highly repetitive DNA. As a key feature of genome architecture associated with centromeric and subtelomeric regions, it locally influences meiotic recombination. In this study, we assess the impact of large tandem repeat arrays on the recombination rate landscape in an avian speciation model, the Eurasian crow. We assembled two high-quality genome references using single-molecule real-time sequencing (long-read assembly [LR]) and single-molecule optical maps (optical map assembly [OM]). A three-way comparison including the published short-read assembly (SR) constructed for the same individual allowed assessing assembly properties and pinpointing misassemblies. By combining information from all three assemblies, we characterized 36 previously unidentified large repetitive regions in the proximity of sequence assembly breakpoints, the majority of which contained complex arrays of a 14-kb satellite repeat or its 1.2-kb subunit. Using whole-genome population resequencing data, we estimated the population-scaled recombination rate (ρ) and found it to be significantly reduced in these regions. These findings are consistent with an effect of low recombination in regions adjacent to centromeric or subtelomeric heterochromatin and add to our understanding of the processes generating widespread heterogeneity in genetic diversity and differentiation along the genome. By combining three different technologies, our results highlight the importance of adding a layer of information on genome structure that is inaccessible to each approach independently.
准确且连续的基因组组装是全面理解塑造基因组多样性和进化过程的关键。然而,它常常受到组成型异染色质的限制,组成型异染色质通常以高度重复的DNA为特征。作为与着丝粒和亚端粒区域相关的基因组结构的一个关键特征,它局部影响减数分裂重组。在本研究中,我们评估了大型串联重复阵列对鸟类物种形成模型——欧亚乌鸦的重组率格局的影响。我们使用单分子实时测序(长读长组装[LR])和单分子光学图谱(光学图谱组装[OM])组装了两个高质量的基因组参考序列。包括为同一个体构建的已发表的短读长组装(SR)在内的三方比较,使得我们能够评估组装特性并找出错误组装。通过整合来自所有三个组装的信息,我们在序列组装断点附近鉴定出36个先前未识别的大型重复区域,其中大多数包含一个14 kb卫星重复序列或其1.2 kb亚基的复杂阵列。利用全基因组群体重测序数据,我们估计了群体规模的重组率(ρ),发现这些区域的重组率显著降低。这些发现与着丝粒或亚端粒异染色质附近区域重组率低的效应一致,并增进了我们对沿基因组产生广泛遗传多样性和分化异质性过程的理解。通过结合三种不同技术,我们的结果突出了添加一层每种方法单独无法获取的基因组结构信息的重要性。