Palacios-Gimenez Octavio M, Koelman Julia, Palmada-Flores Marc, Bradford Tessa M, Jones Karl K, Cooper Steven J B, Kawakami Takeshi, Suh Alexander
Department of Ecology and Genetics - Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, SE-752 36, Uppsala, Sweden.
Department of Organismal Biology - Systematic Biology, Evolutionary Biology Centre, Uppsala University, SE-752 36, Uppsala, Sweden.
BMC Biol. 2020 Dec 21;18(1):199. doi: 10.1186/s12915-020-00925-x.
Repetitive DNA sequences, including transposable elements (TEs) and tandemly repeated satellite DNA (satDNAs), collectively called the "repeatome", are found in high proportion in organisms across the Tree of Life. Grasshoppers have large genomes, averaging 9 Gb, that contain a high proportion of repetitive DNA, which has hampered progress in assembling reference genomes. Here we combined linked-read genomics with transcriptomics to assemble, characterize, and compare the structure of repetitive DNA sequences in four chromosomal races of the morabine grasshopper Vandiemenella viatica species complex and determine their contribution to genome evolution.
We obtained linked-read genome assemblies of 2.73-3.27 Gb from estimated genome sizes of 4.26-5.07 Gb DNA per haploid genome of the four chromosomal races of V. viatica. These constitute the third largest insect genomes assembled so far. Combining complementary annotation tools and manual curation, we found a large diversity of TEs and satDNAs, constituting 66 to 75% per genome assembly. A comparison of sequence divergence within the TE classes revealed massive accumulation of recent TEs in all four races (314-463 Mb per assembly), indicating that their large genome sizes are likely due to similar rates of TE accumulation. Transcriptome sequencing showed more biased TE expression in reproductive tissues than somatic tissues, implying permissive transcription in gametogenesis. Out of 129 satDNA families, 102 satDNA families were shared among the four chromosomal races, which likely represent a diversity of satDNA families in the ancestor of the V. viatica chromosomal races. Notably, 50 of these shared satDNA families underwent differential proliferation since the recent diversification of the V. viatica species complex.
This in-depth annotation of the repeatome in morabine grasshoppers provided new insights into the genome evolution of Orthoptera. Our TEs analysis revealed a massive recent accumulation of TEs equivalent to the size of entire Drosophila genomes, which likely explains the large genome sizes in grasshoppers. Despite an overall high similarity of the TE and satDNA diversity between races, the patterns of TE expression and satDNA proliferation suggest rapid evolution of grasshopper genomes on recent timescales.
重复DNA序列,包括转座元件(TEs)和串联重复卫星DNA(satDNAs),统称为“重复基因组”,在整个生命之树的生物体中所占比例很高。蚱蜢具有庞大的基因组,平均为9Gb,其中包含高比例的重复DNA,这阻碍了参考基因组组装的进展。在这里,我们将连接读长基因组学与转录组学相结合,以组装、表征和比较莫拉宾蚱蜢Vandiemenella viatica物种复合体四个染色体族中重复DNA序列的结构,并确定它们对基因组进化的贡献。
我们从V. viatica四个染色体族的每个单倍体基因组估计4.26 - 5.07Gb DNA的基因组大小中获得了2.73 - 3.27Gb的连接读长基因组组装。这些构成了迄今为止组装的第三大昆虫基因组。结合互补的注释工具和人工整理,我们发现了种类繁多的TEs和satDNAs,每个基因组组装中占66%至75%。对TE类别内序列差异的比较显示,所有四个族中近期TEs大量积累(每个组装314 - 463Mb),这表明它们庞大的基因组大小可能是由于TE积累速率相似。转录组测序显示,生殖组织中TE表达比体细胞组织更具偏向性,这意味着在配子发生过程中有宽松的转录。在129个satDNA家族中,102个satDNA家族在四个染色体族中共享,这可能代表了V. viatica染色体族祖先中satDNA家族的多样性。值得注意的是,自V. viatica物种复合体最近分化以来,这些共享的satDNA家族中有50个经历了差异增殖。
对莫拉宾蚱蜢重复基因组的深入注释为直翅目基因组进化提供了新的见解。我们对TEs的分析揭示了近期TEs大量积累,其规模相当于整个果蝇基因组的大小,这可能解释了蚱蜢庞大的基因组大小。尽管各族之间TE和satDNA多样性总体高度相似,但TE表达模式和satDNA增殖表明蚱蜢基因组在近期时间尺度上快速进化。