Department of Biology, Colorado State University, Fort Collins, Colorado, USA.
Department of Environmental Sciences, Zoology, University of Basel, Basel, Switzerland.
Genome Biol Evol. 2022 May 3;14(5). doi: 10.1093/gbe/evac059.
Intracellular transfers of mitochondrial DNA continue to shape nuclear genomes. Chromosome 2 of the model plant Arabidopsis thaliana contains one of the largest known nuclear insertions of mitochondrial DNA (numts). Estimated at over 600 kb in size, this numt is larger than the entire Arabidopsis mitochondrial genome. The primary Arabidopsis nuclear reference genome contains less than half of the numt because of its structural complexity and repetitiveness. Recent data sets generated with improved long-read sequencing technologies (PacBio HiFi) provide an opportunity to finally determine the accurate sequence and structure of this numt. We performed a de novo assembly using sequencing data from recent initiatives to span the Arabidopsis centromeres, producing a gap-free sequence of the Chromosome 2 numt, which is 641 kb in length and has 99.933% nucleotide sequence identity with the actual mitochondrial genome. The numt assembly is consistent with the repetitive structure previously predicted from fiber-based fluorescent in situ hybridization. Nanopore sequencing data indicate that the numt has high levels of cytosine methylation, helping to explain its biased spectrum of nucleotide sequence divergence and supporting previous inferences that it is transcriptionally inactive. The original numt insertion appears to have involved multiple mitochondrial DNA copies with alternative structures that subsequently underwent an additional duplication event within the nuclear genome. This work provides insights into numt evolution, addresses one of the last unresolved regions of the Arabidopsis reference genome, and represents a resource for distinguishing between highly similar numt and mitochondrial sequences in studies of transcription, epigenetic modifications, and de novo mutations.
线粒体 DNA 的胞内转移持续塑造着核基因组。拟南芥(Arabidopsis thaliana)作为模式植物,其 2 号染色体含有已知最大的核内线粒体 DNA 插入片段(numt)之一。该 numt 大小估计超过 600kb,比整个拟南芥线粒体基因组还大。由于结构复杂且重复序列较多,主要的拟南芥核参考基因组仅包含该 numt 的不到一半序列。最近利用改进的长读测序技术(PacBio HiFi)生成的数据集,为最终确定该 numt 的准确序列和结构提供了机会。我们使用最近跨越拟南芥着丝粒的测序数据进行从头组装,生成了无缺口的 2 号染色体 numt 序列,长度为 641kb,与实际线粒体基因组的核苷酸序列同一性为 99.933%。numt 组装与先前基于纤维的荧光原位杂交预测的重复结构一致。纳米孔测序数据表明,numt 具有高水平的胞嘧啶甲基化,有助于解释其核苷酸序列分歧的偏倚谱,并支持先前的推断,即其转录失活。原始 numt 的插入似乎涉及具有替代结构的多个线粒体 DNA 拷贝,随后在核基因组内发生了额外的复制事件。这项工作深入了解了 numt 的进化,解决了拟南芥参考基因组中最后一个未解决的区域之一,并为研究转录、表观遗传修饰和新突变时区分高度相似的 numt 和线粒体序列提供了资源。