Department of Integrative Biology, University of California Berkeley, Berkeley, California, United States of America.
PLoS Biol. 2018 Jul 30;16(7):e2006348. doi: 10.1371/journal.pbio.2006348. eCollection 2018 Jul.
While short-read sequencing technology has resulted in a sharp increase in the number of species with genome assemblies, these assemblies are typically highly fragmented. Repeats pose the largest challenge for reference genome assembly, and pericentromeric regions and the repeat-rich Y chromosome are typically ignored from sequencing projects. Here, we assemble the genome of Drosophila miranda using long reads for contig formation, chromatin interaction maps for scaffolding and short reads, and optical mapping and bacterial artificial chromosome (BAC) clone sequencing for consensus validation. Our assembly recovers entire chromosomes and contains large fractions of repetitive DNA, including about 41.5 Mb of pericentromeric and telomeric regions, and >100 Mb of the recently formed highly repetitive neo-Y chromosome. While Y chromosome evolution is typically characterized by global sequence loss and shrinkage, the neo-Y increased in size by almost 3-fold because of the accumulation of repetitive sequences. Our high-quality assembly allows us to reconstruct the chromosomal events that have led to the unusual sex chromosome karyotype in D. miranda, including the independent de novo formation of a pair of sex chromosomes at two distinct time points, or the reversion of a former Y chromosome to an autosome.
虽然短读测序技术使得具有基因组图谱的物种数量急剧增加,但这些图谱通常高度碎片化。重复序列对参考基因组组装构成了最大的挑战,因此着丝粒区域和富含重复序列的 Y 染色体通常会被测序项目所忽略。在这里,我们使用长读序列进行连续群形成、染色质相互作用图谱进行支架构建以及短读序列进行共识验证,从而组装了 Drosophila miranda 的基因组。我们的组装恢复了整个染色体,并包含大量重复 DNA 片段,包括约 41.5 Mb 的着丝粒和端粒区域,以及 >100 Mb 的最近形成的高度重复的新 Y 染色体。虽然 Y 染色体的进化通常以全局序列丢失和收缩为特征,但新 Y 染色体由于重复序列的积累而增加了近 3 倍。我们的高质量组装使我们能够重建导致 D. miranda 异常性染色体组型的染色体事件,包括在两个不同时间点独立从头形成一对性染色体,或以前的 Y 染色体返回到常染色体。