Department of Molecular Bioscience, Kangwon National University, Chuncheon, 24341, Korea.
Department of Horticulture, Kangwon National University, Chuncheon, 24341, Korea.
Genes Genomics. 2020 May;42(5):553-570. doi: 10.1007/s13258-020-00923-x. Epub 2020 Mar 21.
Chloroplasts are a common character in plants. The chloroplasts in each plant lineage have shaped their own genomes, plastomes, by structural changes and transferring many genes to nuclear genomes during plant evolution. Some plastid genes have introns that are mostly group II introns.
This study aimed to get genomic and evolutionary insights on the plastomes from green algae to flowering plants.
Plastomes of 115 species from green algae, bryophytes, pteridophytes (spore bearing vascular plants), gymnosperms, and angiosperms were mined from NCBI organelle genome database. Plastome structure, gene contents and GC contents were analyzed by the in-house developed Phyton code. Intronic features including presence/absence, length, intron phases were analyzed by manually in the annotated information in NCBI.
The canonical quadripartite structures were retained in most plastomes except of a few plastomes that had lost an invert repeat (IR). Expansion or reduction or deletion of IRs resulted in the length variation of the plastomes. The number of protein coding genes ranged from 40 to 92 with an average 79.43 ± 5.84 per plastome and gene losses were apparent in specific lineages. The number of trn genes ranged from 13 to 33 with an average 21.19 ± 2.42 per plastome. Ribosomal RNA genes, rrn, were located in the IRs so that they were present in a duplicate except of the species that had lost one of the IR. GC contents were variable from 24.9 to 51.0% with an average 38.21 ± 3.27%, indicating bias to high AT contents. Plastid introns were present in 18 protein coding genes, six trn genes, and one rrn gene. Intron losses occurred among the orthologous genes in different plant lineages. The plastid introns were long compared with the nuclear introns, which might be related with the spliceosome nuclear introns and self-splicing group II plastid introns. The trnK-UUU intron contained the maturase encoding matK gene except in the chlorophyte algae and monilophyte ferns in which the trnK-UUU was lost, but matK retained. There were many annotation artefacts in the intron positions in the NCBI database. In the analysis of intron phases, phase 0 introns were more frequent than those of phase 2 and 3 introns. Phase polymorphism was observed in the introns of clpP which was derived from nucleotide insertion. Plastid trn introns were long compared to the archaeal or eukaryotic nuclear tRNA introns. Of the six plastid trn introns, one was at the D loop and other five were at the anticodon loop. The insertion sites were conserved among the trn genes in archaea, eukaryotic nuclear and plastid tRNA genes.
Current study refurbrished the previous findings of structural variations, gene contents, and GC contents of the chloroplast genomes from green algae to flowering plants. The study also included some noble findings and discussions on the plastome introns including their length variations and phase variation. We also presented and corrected some false annotations on the introns in protein coding and tRNA genes in the genome database, which might be confirmed by the chloroplast transcriptome analysis in the future.
叶绿体是植物中的一个常见特征。在植物进化过程中,每个植物谱系的叶绿体通过结构变化和将许多基因转移到核基因组,塑造了自己的基因组,即质体基因组。一些质体基因有内含子,这些内含子主要是第二类内含子。
本研究旨在从绿藻到开花植物获得质体基因组的基因组和进化见解。
从 NCBI 细胞器基因组数据库中挖掘了来自绿藻、苔藓植物、蕨类植物(有孢子的维管束植物)、裸子植物和被子植物的 115 个物种的质体基因组。通过内部开发的 Phyton 代码分析质体基因组结构、基因含量和 GC 含量。通过在 NCBI 中注释信息中手动分析内含子的存在/缺失、长度、内含子相位来分析内含子特征。
除了少数几个丢失了反向重复(IR)的质体基因组外,大多数质体基因组保留了典型的四分体结构。IR 的扩展、减少或缺失导致了质体基因组长度的变化。蛋白质编码基因的数量从 40 到 92 个不等,平均每个质体基因组有 79.43±5.84 个,在特定谱系中出现了基因丢失。trn 基因的数量从 13 到 33 个不等,平均每个质体基因组有 21.19±2.42 个。核糖体 RNA 基因 rrn 位于 IR 中,因此它们是重复的,除了那些丢失了一个 IR 的物种。GC 含量从 24.9%到 51.0%不等,平均值为 38.21±3.27%,表明 AT 含量较高。质体内含子存在于 18 个蛋白质编码基因、6 个 trn 基因和 1 个 rrn 基因中。内含子丢失发生在不同植物谱系的同源基因之间。与核内含子相比,质体内含子较长,这可能与剪接体核内含子和自我剪接的第二类质体内含子有关。trnK-UUU 内含子包含编码 matK 基因的成熟酶,除了绿藻和石松类蕨类植物丢失了 trnK-UUU 外,matK 保留了下来。在 NCBI 数据库中,内含子位置的注释存在许多错误。在内含子相位分析中,相位 0 内含子比相位 2 和 3 内含子更常见。在 clpP 内含子中观察到相位多态性,clpP 是由核苷酸插入衍生而来的。与古菌或真核核 tRNA 内含子相比,质体 trn 内含子较长。在六个质体 trn 内含子中,一个位于 D 环,其他五个位于反密码子环。在古菌、真核核和质体 tRNA 基因中,trn 基因的插入位点是保守的。
目前的研究更新了从绿藻到开花植物的叶绿体基因组的结构变异、基因含量和 GC 含量的先前发现。该研究还包括一些关于质体内含子的有价值的发现和讨论,包括它们的长度变化和相位变化。我们还提出并纠正了基因组数据库中蛋白质编码和 tRNA 基因内含子的一些错误注释,这些注释可能在未来通过质体转录组分析得到证实。