Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Banacha 2, 02-097, Warsaw, Poland.
Department of Molecular and Human Genetics, Baylor College of Medicine, 1 Baylor Plaza, 77030, Houston, TX, USA.
Genome Biol. 2023 Sep 11;24(1):205. doi: 10.1186/s13059-023-03022-8.
Resolving complex genomic regions rich in segmental duplications (SDs) is challenging due to the high error rate of long-read sequencing. Here, we describe a targeted approach with a novel genome assembler PhaseDancer that extends SD-rich regions of interest iteratively. We validate its robustness and efficiency using a golden-standard set of human BAC clones and in silico-generated SDs with predefined evolutionary scenarios. PhaseDancer enables extension of the incomplete complex SD-rich subtelomeric regions of Great Ape chromosomes orthologous to the human chromosome 2 (HSA2) fusion site, informing a model of HSA2 formation and unravelling the evolution of human and Great Ape genomes.
解决富含片段重复(SD)的复杂基因组区域是具有挑战性的,因为长读测序的错误率很高。在这里,我们描述了一种靶向方法,使用一种新型的基因组组装器 PhaseDancer 来迭代扩展感兴趣的 SD 丰富区域。我们使用一套人类 BAC 克隆的黄金标准集和具有预定义进化场景的虚拟生成的 SD 来验证其稳健性和效率。PhaseDancer 能够扩展与人类染色体 2(HSA2)融合位点同源的大型动物染色体上不完整的复杂 SD 丰富的端粒下区域,为 HSA2 形成的模型提供信息,并揭示人类和大型动物基因组的进化。