Guo Yan-Yan, Yang Jia-Xing, Li Hong-Kun, Zhao Hu-Sheng
College of Plant Protection, Henan Agricultural University, Zhengzhou, China.
Front Plant Sci. 2021 Feb 9;12:609729. doi: 10.3389/fpls.2021.609729. eCollection 2021.
The size of the chloroplast genome (plastome) of autotrophic angiosperms is generally conserved. However, the chloroplast genomes of some lineages are greatly expanded, which may render assembling these genomes from short read sequencing data more challenging. Here, we present the sequencing, assembly, and annotation of the chloroplast genomes of and . We assembled the chloroplast genomes of the two species with a combination of short-read Illumina data and long-read PacBio data. The plastomes of the two species are characterized by expanded genome size, proliferated AT-rich repeat sequences, low GC content and gene density, as well as low substitution rates of the coding genes. The plastomes of (197,815 bp) and (212,668 bp) are substantially larger than those of the three species sequenced in previous studies. The plastome of is the longest one of Orchidaceae to date. Despite the increase in genome size, the gene order and gene number of the plastomes are conserved, with the exception of an ∼75 kb large inversion in the large single copy (LSC) region shared by the two species. The most striking is the record-setting low GC content in (28.2%). Moreover, the plastome expansion of the two species is strongly correlated with the proliferation of AT-biased non-coding regions: the non-coding content of is in excess of 57%. The genus provides a typical example of plastome expansion induced by the expansion of non-coding regions. Considering the pros and cons of different sequencing technologies, we recommend hybrid assembly based on long and short reads applied to the sequencing of plastomes with AT-biased base composition.
自养被子植物的叶绿体基因组(质体基因组)大小通常是保守的。然而,一些谱系的叶绿体基因组大幅扩展,这可能使从短读长测序数据组装这些基因组更具挑战性。在这里,我们展示了[具体物种1]和[具体物种2]叶绿体基因组的测序、组装和注释。我们结合短读长Illumina数据和长读长PacBio数据组装了这两个物种的叶绿体基因组。这两个物种的质体基因组具有基因组大小扩展、富含AT的重复序列增殖、GC含量和基因密度低以及编码基因替换率低的特征。[具体物种1](197,815 bp)和[具体物种2](212,668 bp)的质体基因组比先前研究中测序的三个物种的质体基因组大得多。[具体物种2]的质体基因组是迄今为止兰科中最长的。尽管基因组大小增加,但质体基因组的基因顺序和基因数量是保守的,只是这两个物种在大单拷贝(LSC)区域共享一个约75 kb的大倒位。最引人注目的是[具体物种2]创纪录的低GC含量(28.2%)。此外,这两个物种的质体基因组扩展与富含AT的非编码区域的增殖密切相关:[具体物种2]的非编码含量超过57%。该属提供了一个由非编码区域扩展导致质体基因组扩展的典型例子。考虑到不同测序技术的优缺点,我们建议基于长读长和短读长的混合组装应用于具有AT偏向碱基组成的质体基因组测序。