Giorgashvili Eka, Reichel Katja, Caswara Calvinna, Kerimov Vuqar, Borsch Thomas, Gruenstaeudl Michael
Systematische Botanik und Pflanzengeographie, Institut für Biologie, Freie Universität Berlin, Berlin, Germany.
Institute of Botany, Azerbaijan National Academy of Sciences (ANAS), Baku, Azerbaijan.
Front Plant Sci. 2022 Jul 6;13:779830. doi: 10.3389/fpls.2022.779830. eCollection 2022.
Most plastid genome sequences are assembled from short-read whole-genome sequencing data, yet the impact that sequencing coverage and the choice of assembly software can have on the accuracy of the resulting assemblies is poorly understood. In this study, we test the impact of both factors on plastid genome assembly in the threatened and rare endemic shrub . We aim to characterize the differences across plastid genome assemblies generated by different assembly software tools and levels of sequencing coverage and to determine if these differences are large enough to affect the phylogenetic position inferred for compared to congeners. Four assembly software tools (FastPlast, GetOrganelle, IOGA, and NOVOPlasty) and seven levels of sequencing coverage across the plastid genome (original sequencing depth, 2,000x, 1,000x, 500x, 250x, 100x, and 50x) are compared in our analyses. The resulting assemblies are evaluated with regard to reproducibility, contig number, gene complement, inverted repeat length, and computation time; the impact of sequence differences on phylogenetic reconstruction is assessed. Our results show that software choice can have a considerable impact on the accuracy and reproducibility of plastid genome assembly and that GetOrganelle produces the most consistent assemblies for . Moreover, we demonstrate that a sequencing coverage between 500x and 100x can reduce both the sequence variability across assembly contigs and computation time. When comparing the most reliable plastid genome assemblies of , a sequence difference in only three nucleotide positions is detected, which is less than the difference potentially introduced through software choice.
大多数质体基因组序列是通过短读长全基因组测序数据组装而成的,然而,测序覆盖度和组装软件的选择对最终组装结果准确性的影响却鲜为人知。在本研究中,我们测试了这两个因素对濒危珍稀特有灌木质体基因组组装的影响。我们旨在描述不同组装软件工具和测序覆盖度水平所产生的质体基因组组装之间的差异,并确定这些差异是否大到足以影响与同属植物相比所推断的系统发育位置。在我们的分析中,比较了四种组装软件工具(FastPlast、GetOrganelle、IOGA和NOVOPlasty)以及质体基因组上的七个测序覆盖度水平(原始测序深度、2000x、1000x、500x、250x、100x和50x)。对所得组装结果在可重复性、重叠群数量、基因互补性、反向重复长度和计算时间方面进行评估;评估序列差异对系统发育重建的影响。我们的结果表明,软件选择对质体基因组组装的准确性和可重复性有相当大的影响,并且GetOrganelle为[该植物名称]生成的组装结果最一致。此外,我们证明500x至100x之间的测序覆盖度可以减少组装重叠群之间的序列变异性和计算时间。在比较[该植物名称]最可靠的质体基因组组装时,仅检测到三个核苷酸位置的序列差异,这小于软件选择可能引入的差异。