Institut für Botanik, Technische Universität Dresden, Dresden, Germany.
Departamento de Botánica, Universidad Nacional Autónoma de México, Mexico City, Mexico.
Am J Bot. 2024 Mar;111(3):e16300. doi: 10.1002/ajb2.16300. Epub 2024 Mar 12.
Many plastomes of autotrophic Piperales have been reported to date, describing a variety of differences. Most studies focused only on a few species or a single genus, and extensive, comparative analyses have not been done. Here, we reviewed publicly available plastome reconstructions for autotrophic Piperales, reanalyzed publicly available raw data, and provided new sequence data for all previously missing genera. Comparative plastome genomics of >100 autotrophic Piperales were performed.
We performed de novo assemblies to reconstruct the plastomes of newly generated sequence data. We used Sanger sequencing and read mapping to verify the assemblies and to bridge assembly gaps. Furthermore, we reconstructed the phylogenetic relationships as a foundation for comparative plastome genomics.
We identified a plethora of assembly and annotation issues in published plastome data, which, if unattended, will lead to an artificial increase of diversity. We were able to detect patterns of missing and incorrect feature annotation and determined that the inverted repeat (IR) boundaries were the major source for erroneous assembly. Accounting for the aforementioned issues, we discovered relatively stable junctions of the IRs and the small single-copy region (SSC), whereas the majority of plastome variations among Piperales stems from fluctuations of the boundaries of the IR and the large single-copy (LSC) region.
This study of all available plastomes of autotrophic Piperales, expanded by new data for previously missing genera, highlights the IR-LSC junctions as a potential marker for discrimination of various taxonomic levels. Our data indicates a pseudogene-like status for cemA and ycf15 in various Piperales. Based on a review of published data, we conclude that incorrect IR-SSC boundary identification is the major source for erroneous plastome assembly. We propose a gold standard for assembly and annotation of high-quality plastomes based on de novo assembly methods and appropriate references for gene annotation.
迄今为止,已有许多自养胡椒科的质体基因组被报道,描述了各种各样的差异。大多数研究仅集中在少数几个物种或单个属上,尚未进行广泛的比较分析。在这里,我们回顾了自养胡椒科中可公开获得的质体基因组重建,重新分析了可公开获得的原始数据,并为所有以前缺失的属提供了新的序列数据。对 >100 种自养胡椒科进行了比较质体基因组学研究。
我们进行从头组装以重建新生成的序列数据的质体基因组。我们使用 Sanger 测序和读映射来验证组装并桥接组装间隙。此外,我们重建了系统发育关系,作为比较质体基因组学的基础。
我们在已发表的质体基因组数据中发现了大量的组装和注释问题,如果不加以注意,将会导致多样性的人为增加。我们能够检测到缺失和错误特征注释的模式,并确定倒置重复(IR)边界是错误组装的主要来源。考虑到上述问题,我们发现 IR 和小单拷贝区(SSC)的相对稳定的连接,而胡椒科之间的大多数质体变异源于 IR 和大单拷贝区(LSC)边界的波动。
这项对自养胡椒科所有可获得的质体基因组的研究,通过对以前缺失的属的新数据进行扩展,突出了 IR-LSC 连接作为各种分类水平区分的潜在标记。我们的数据表明,cemA 和 ycf15 在各种胡椒科中处于类似假基因的状态。基于对已发表数据的审查,我们得出结论,错误的 IR-SSC 边界识别是错误质体组装的主要来源。我们提出了一种基于从头组装方法和适当的基因注释参考的高质量质体组装和注释的黄金标准。