Tørresen Ole K, Star Bastiaan, Jentoft Sissel, Reinar William B, Grove Harald, Miller Jason R, Walenz Brian P, Knight James, Ekholm Jenny M, Peluso Paul, Edvardsen Rolf B, Tooming-Klunderud Ave, Skage Morten, Lien Sigbjørn, Jakobsen Kjetill S, Nederbragt Alexander J
Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, NO-0316, Norway.
Department of Natural Sciences, University of Agder, Kristiansand, NO-4604, Norway.
BMC Genomics. 2017 Jan 18;18(1):95. doi: 10.1186/s12864-016-3448-x.
The first Atlantic cod (Gadus morhua) genome assembly published in 2011 was one of the early genome assemblies exclusively based on high-throughput 454 pyrosequencing. Since then, rapid advances in sequencing technologies have led to a multitude of assemblies generated for complex genomes, although many of these are of a fragmented nature with a significant fraction of bases in gaps. The development of long-read sequencing and improved software now enable the generation of more contiguous genome assemblies.
By combining data from Illumina, 454 and the longer PacBio sequencing technologies, as well as integrating the results of multiple assembly programs, we have created a substantially improved version of the Atlantic cod genome assembly. The sequence contiguity of this assembly is increased fifty-fold and the proportion of gap-bases has been reduced fifteen-fold. Compared to other vertebrates, the assembly contains an unusual high density of tandem repeats (TRs). Indeed, retrospective analyses reveal that gaps in the first genome assembly were largely associated with these TRs. We show that 21% of the TRs across the assembly, 19% in the promoter regions and 12% in the coding sequences are heterozygous in the sequenced individual.
The inclusion of PacBio reads combined with the use of multiple assembly programs drastically improved the Atlantic cod genome assembly by successfully resolving long TRs. The high frequency of heterozygous TRs within or in the vicinity of genes in the genome indicate a considerable standing genomic variation in Atlantic cod populations, which is likely of evolutionary importance.
2011年发布的首个大西洋鳕鱼(Gadus morhua)基因组组装是早期仅基于高通量454焦磷酸测序的基因组组装之一。从那时起,测序技术的快速发展导致为复杂基因组生成了大量组装结果,尽管其中许多是片段化的,有相当一部分碱基存在缺口。长读长测序的发展和改进的软件现在能够生成更连续的基因组组装。
通过结合来自Illumina、454和更长读长的PacBio测序技术的数据,并整合多个组装程序的结果,我们创建了一个大幅改进的大西洋鳕鱼基因组组装版本。该组装的序列连续性提高了50倍,缺口碱基的比例降低了15倍。与其他脊椎动物相比,该组装包含异常高密度的串联重复序列(TRs)。实际上,回顾性分析表明,第一个基因组组装中的缺口很大程度上与这些TRs有关。我们表明,整个组装中21%的TRs、启动子区域中19%的TRs和编码序列中12%的TRs在测序个体中是杂合的。
纳入PacBio读长并结合使用多个组装程序,通过成功解析长TRs,极大地改进了大西洋鳕鱼基因组组装。基因组中基因内部或附近杂合TRs的高频率表明大西洋鳕鱼种群中存在相当大的基因组变异,这可能具有进化重要性。