Zapata Luis, Ding Jia, Willing Eva-Maria, Hartwig Benjamin, Bezdan Daniela, Jiao Wen-Biao, Patel Vipul, Velikkakam James Geo, Koornneef Maarten, Ossowski Stephan, Schneeberger Korbinian
Bioinformatics and Genomics Programme, Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, 08003 Barcelona, Spain; Universitat Pompeu Fabra, 08002 Barcelona, Spain;
Department of Plant Breeding and Genetics, Max Planck Institute for Plant Breeding Research, 50829 Cologne, Germany;
Proc Natl Acad Sci U S A. 2016 Jul 12;113(28):E4052-60. doi: 10.1073/pnas.1607532113. Epub 2016 Jun 27.
Resequencing or reference-based assemblies reveal large parts of the small-scale sequence variation. However, they typically fail to separate such local variation into colinear and rearranged variation, because they usually do not recover the complement of large-scale rearrangements, including transpositions and inversions. Besides the availability of hundreds of genomes of diverse Arabidopsis thaliana accessions, there is so far only one full-length assembled genome: the reference sequence. We have assembled 117 Mb of the A. thaliana Landsberg erecta (Ler) genome into five chromosome-equivalent sequences using a combination of short Illumina reads, long PacBio reads, and linkage information. Whole-genome comparison against the reference sequence revealed 564 transpositions and 47 inversions comprising ∼3.6 Mb, in addition to 4.1 Mb of nonreference sequence, mostly originating from duplications. Although rearranged regions are not different in local divergence from colinear regions, they are drastically depleted for meiotic recombination in heterozygotes. Using a 1.2-Mb inversion as an example, we show that such rearrangement-mediated reduction of meiotic recombination can lead to genetically isolated haplotypes in the worldwide population of A. thaliana Moreover, we found 105 single-copy genes, which were only present in the reference sequence or the Ler assembly, and 334 single-copy orthologs, which showed an additional copy in only one of the genomes. To our knowledge, this work gives first insights into the degree and type of variation, which will be revealed once complete assemblies will replace resequencing or other reference-dependent methods.
重测序或基于参考序列的组装揭示了小规模序列变异的大部分情况。然而,它们通常无法将这种局部变异区分为共线性变异和重排变异,因为它们通常无法恢复大规模重排的互补序列,包括转座和倒位。除了有数百个不同拟南芥种质的基因组外,到目前为止只有一个全长组装基因组:参考序列。我们使用短读长的Illumina测序数据、长读长的PacBio测序数据和连锁信息,将117 Mb的拟南芥直立型(Ler)基因组组装成了五个与染色体等效的序列。与参考序列进行全基因组比较,除了4.1 Mb的非参考序列(大多源自重复)外,还发现了564个转座和47个倒位,共约3.6 Mb。尽管重排区域在局部差异上与共线性区域并无不同,但在杂合子中它们的减数分裂重组却大幅减少。以一个1.2 Mb的倒位为例,我们表明这种由重排介导的减数分裂重组减少可导致拟南芥全球种群中出现遗传隔离的单倍型。此外,我们发现了105个单拷贝基因,它们仅存在于参考序列或Ler组装中,以及334个单拷贝直系同源基因,它们在仅一个基因组中出现了额外的拷贝。据我们所知,这项工作首次深入了解了变异的程度和类型,一旦完整的组装取代重测序或其他依赖参考序列的方法,这些变异将会被揭示出来。