Bush Zachary D, Naftaly Alice F S, Dinwiddie Devin, Albers Cora, Hillers Kenneth J, Libuda Diana E
Institute of Molecular Biology, Department of Biology, University of Oregon, 1229 Franklin Blvd Eugene, OR 97403, USA.
Biological Sciences Department, California Polytechnic State University, San Luis Obispo, California, USA.
bioRxiv. 2023 Nov 3:2023.01.13.523974. doi: 10.1101/2023.01.13.523974.
Genomic structural variations (SVs) and transposable elements (TEs) can be significant contributors to genome evolution, altered gene expression, and risk of genetic diseases. Recent advancements in long-read sequencing have greatly improved the quality of genome assemblies and enhanced the detection of sequence variants at the scale of hundreds or thousands of bases. Comparisons between two diverged wild isolates of , the Bristol and Hawaiian strains, have been widely utilized in the analysis of small genetic variations. Genetic drift, including SVs and rearrangements of repeated sequences such as TEs, can occur over time from long-term maintenance of wild type isolates within the laboratory. To comprehensively detect both large and small structural variations as well as TEs due to genetic drift, we generated genome assemblies and annotations for each strain from our lab collection using both long- and short-read sequencing and compared our assemblies and annotations with that of other lab wild type strains. Within our lab assemblies, we annotate over 3.1Mb of sequence divergence between the Bristol and Hawaiian isolates: 337,584 SNPs, 94,503 small insertion-deletions (<50bp), and 4,334 structural variations (>50bp). Further, we define the location and movement of specific DNA TEs between N2 Bristol and CB4856 Hawaiian wild type isolates. Specifically, we find the N2 Bristol genome has 20.6% more TEs from the family than the CB4856 Hawaiian genome. Moreover, we identified Zator elements as the most abundant and mobile TE family in the genome. Using specific TE sequences with unique SNPs, we also identify 38 TEs that moved intrachromosomally and 9 TEs that moved interchromosomally between the N2 Bristol and CB4856 Hawaiian genomes. By comparing the genome assembly of our lab collection Bristol isolate to the VC2010 Bristol assembly, we also reveal that lab lineages display over 2 Mb of total variation: 1,162 SNPs, 1,528 indels, and 897 SVs with 95% of the variation due to SVs. Overall, our work demonstrates the unique contribution of SVs and TEs to variation and genetic drift between wild type laboratory strains assumed to be isogenic despite growing evidence of genetic drift and phenotypic variation.
基因组结构变异(SVs)和转座元件(TEs)可能是基因组进化、基因表达改变和遗传疾病风险的重要促成因素。长读长测序技术的最新进展极大地提高了基因组组装的质量,并增强了在数百或数千碱基规模上对序列变异的检测能力。布里斯托尔菌株和夏威夷菌株这两种分化的野生分离株之间的比较已广泛用于小遗传变异的分析。随着时间的推移,在实验室中长期保存野生型分离株时,可能会发生包括SVs和重复序列(如TEs)重排在内的遗传漂变。为了全面检测由于遗传漂变导致的大小结构变异以及TEs,我们使用长读长和短读长测序技术,对实验室收集的每个菌株进行了基因组组装和注释,并将我们的组装和注释与其他实验室野生型菌株的进行了比较。在我们实验室的组装中,我们注释了布里斯托尔菌株和夏威夷菌株之间超过3.1Mb的序列差异:337,584个单核苷酸多态性(SNPs)、94,503个小插入缺失(<50bp)和4,334个结构变异(>50bp)。此外,我们确定了特定DNA TEs在N2布里斯托尔菌株和CB4856夏威夷野生型菌株之间的位置和移动情况。具体而言,我们发现N2布里斯托尔基因组中来自该家族的TEs比CB4856夏威夷基因组多20.6%。此外,我们确定Zator元件是基因组中最丰富且移动性最强的TE家族。利用具有独特SNPs的特定TE序列,我们还确定了38个在N2布里斯托尔菌株和CB4856夏威夷基因组之间进行染色体内部移动的TEs以及9个进行染色体间移动的TEs。通过将我们实验室收集的布里斯托尔菌株的基因组组装与VC2010布里斯托尔组装进行比较,我们还发现实验室谱系显示出超过2Mb的总变异:1,162个SNPs、1,528个插入缺失和897个SVs,其中95%的变异是由SVs引起的。总体而言,我们的工作证明了SVs和TEs对假定为同基因的野生型实验室菌株之间的变异和遗传漂变的独特贡献,尽管越来越多的证据表明存在遗传漂变和表型变异。