Xian Wenfei, Bezrukov Ilja, Bao Zhigui, Vorbrugg Sebastian, Gautam Anupam, Weigel Detlef
Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany.
Algorithms in Bioinformatics, Institute for Bioinformatics and Medical Informatics, University of Tübingen, 72076 Tübingen, Germany.
Mol Biol Evol. 2025 Jan 6;42(1). doi: 10.1093/molbev/msae247.
Plant cells have two major organelles with their own genomes: chloroplasts and mitochondria. While chloroplast genomes tend to be structurally conserved, the mitochondrial genomes of plants, which are much larger than those of animals, are characterized by complex structural variation. We introduce TIPPo, a user-friendly, reference-free assembly tool that uses PacBio high-fidelity long-read data and that does not rely on genomes from related species or nuclear genome information for the assembly of organellar genomes. TIPPo employs a deep learning model for initial read classification and leverages k-mer counting for further refinement, significantly reducing the impact of nuclear insertions of organellar DNA on the assembly process. We used TIPPo to completely assemble a set of 54 complete chloroplast genomes. No other tool was able to completely assemble this set. TIPPo is comparable with PMAT in assembling mitochondrial genomes from most species but does achieve even higher completeness for several species. We also used the assembled organelle genomes to identify instances of nuclear plastid DNA (NUPTs) and nuclear mitochondrial DNA (NUMTs) insertions. The cumulative length of NUPTs/NUMTs positively correlates with the size of the nuclear genome, suggesting that insertions occur stochastically. NUPTs/NUMTs show predominantly C:G to T:A changes, with the mutated cytosines typically found in CG and CHG contexts, suggesting that degradation of NUPT and NUMT sequences is driven by the known elevated mutation rate of methylated cytosines. Small interfering RNA loci are enriched in NUPTs and NUMTs, consistent with the RdDM pathway mediating DNA methylation in these sequences.
叶绿体和线粒体。虽然叶绿体基因组在结构上往往较为保守,但植物的线粒体基因组比动物的线粒体基因组大得多,其特点是结构复杂多变。我们引入了TIPPo,这是一种用户友好、无需参考序列的组装工具,它使用PacBio高保真长读长数据,并且在组装细胞器基因组时不依赖相关物种的基因组或核基因组信息。TIPPo采用深度学习模型进行初始读段分类,并利用k-mer计数进行进一步优化,显著降低了细胞器DNA的核插入对组装过程的影响。我们使用TIPPo完全组装了一组54个完整的叶绿体基因组。没有其他工具能够完全组装这一组基因组。在从大多数物种组装线粒体基因组方面,TIPPo与PMAT相当,但对于几个物种,TIPPo确实实现了更高的完整性。我们还使用组装好的细胞器基因组来识别核质体DNA(NUPTs)和核线粒体DNA(NUMTs)插入的实例。NUPTs/NUMTs的累积长度与核基因组大小呈正相关,这表明插入是随机发生的。NUPTs/NUMTs主要显示C:G到T:A的变化,突变的胞嘧啶通常出现在CG和CHG环境中,这表明NUPT和NUMT序列的降解是由已知的甲基化胞嘧啶升高的突变率驱动的。小干扰RNA位点在NUPTs和NUMTs中富集,这与RNA介导的DNA甲基化途径在这些序列中起作用一致。